0% found this document useful (0 votes)
31 views5 pages

SHAP for Interpreting ML Models

This document discusses interpreting machine learning models using SHAP (SHapley Additive exPlanations). SHAP allows understanding key factors of heterogeneity in complex models like neural networks. It can calculate Shapley values quickly and visually through plots without refitting models. An example shows SHAP plots for a causal forest model on synthetic data, where the first feature has a strong causal effect. The plots clearly show the first feature's significant impact. The document further explains that Shapley values are calculated per observation and feature, marginalizing out other features to show the change in prediction given that feature versus the mean. SHAP plots provide a fast, visual way to interpret complex models.

Uploaded by

Josh Li
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views5 pages

SHAP for Interpreting ML Models

This document discusses interpreting machine learning models using SHAP (SHapley Additive exPlanations). SHAP allows understanding key factors of heterogeneity in complex models like neural networks. It can calculate Shapley values quickly and visually through plots without refitting models. An example shows SHAP plots for a causal forest model on synthetic data, where the first feature has a strong causal effect. The plots clearly show the first feature's significant impact. The document further explains that Shapley values are calculated per observation and feature, marginalizing out other features to show the change in prediction given that feature versus the mean. SHAP plots provide a fast, visual way to interpret complex models.

Uploaded by

Josh Li
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Topic homework: Interpretability with SHAP

In this notebook we explore the open source library SHAP for interpreting black-box machine learning
models. SHAP comes with many clearn benefits:

Allows us to understand the key factors of hetergoeneity in complex models - such as neural
networks or boosted trees
Can be caluclated quickly and visually expressed - no need to fit multiple models

Let's see an example of SHAP plots in action:

pip install econml

pip install shap

## Ignore warnings
from [Link] import CausalForestDML, LinearDML, NonParamDML
from [Link] import DRLearner
from [Link] import DomainAdaptationLearner, XLearner
from [Link] import LinearIntentToTreatDRIV
import numpy as np
import [Link]
import [Link] as plt
import shap
from [Link] import RandomForestRegressor, RandomForestClassifier
from sklearn.linear_model import Lasso

import sklearn

[Link](123)
n_samples = 5000
n_features = 10
true_te = lambda X: (X[:, 0]>0) * X[:, 0]
X = [Link](0, 1, size=(n_samples, n_features))
W = [Link](0, 1, size=(n_samples, n_features))
T = [Link](1, [Link](X[:, 0]))
y = true_te(X) * T + 5.0 * X[:, 0] + [Link](0, .1, size=(n_samples,))
X_test = X[:min(100, n_samples)].copy()
X_test[:, 0] = [Link]([Link](X[:, 0], 1), [Link](X[:, 0], 99), min(100,

Here, we see a Forest Double Machine Learning Estimator which is a forest model with residualization to
synthetic data. The data was genererated specifically with the first feature having a strong casual effect,
while the other are linear multiples of random noise.
est = CausalForestDML(random_state=123)
[Link](y, T, X=X, W=W)
shap_values = est.shap_values(X[:20])
[Link](shap_values['Y0']['T0'])

'normalize' was deprecated in version 1.0 and will be removed in 1.2. Please leave th
'normalize' was deprecated in version 1.0 and will be removed in 1.2. Please leave th
'normalize' was deprecated in version 1.0 and will be removed in 1.2. Please leave th
'normalize' was deprecated in version 1.0 and will be removed in 1.2. Please leave th

Its important to note that the shapley value is calculated for each of the 20 rows in the data given to the
est.shap_values() function. The plot shows those 20 points with a random up and down jitter to avoid
overlapping points. As a result, there is not a single shapley value per feature, but a shapley value per
feature per observation.

The shap plot was clearly to indicate that high values in the first feature has significant impact on the
model output. But what does this impact mean?

We investigate the documentation of shap plots to better understand what SHAP represents:

An easier example of SHAP would be to compare to a linear model with coefficients:

# a classic housing price dataset


X,y = [Link]()
X100 = [Link](X, 100) # 100 instances for use as the background distribution

# a simple linear model


model = sklearn.linear_model.LinearRegression()
[Link](X, y)
print("Model coefficients:\n")
for i in range([Link][1]):
print([Link][i], "=", model.coef_[i].round(4))

Model coefficients:

CRIM = -0.108
ZN = 0.0464
INDUS = 0.0206
CHAS = 2.6867
NOX = -17.7666
RM = 3.8099
AGE = 0.0007
DIS = -1.4756
RAD = 0.306
TAX = -0.0123
PTRATIO = -0.9527
B = 0.0093
LSTAT = -0.5248

Here we see the linear coefficients we are familiar with. However the value of the coefficients depends on
the scale of the feature, thus its absolute value is not indicative of its importance.
Instead, the authors of SHAP suggest a partial dependance plot, here we see it plotted for one feature of
AGE.

[Link].partial_dependence(
"RM", [Link], X100, ice=False,
model_expected_value=True, feature_expected_value=True
)

We see that, because this model is linear as AGE increase the expeceted value of the models predictions
(with all the other features marginalized out) is shown in the blue line.
To calculate SHAP values, we attempt to find an existing model 𝑓 with a subset of features 𝑆 , which is
done by integrating out the other features using conditional expected value formulation. As a result, we
see how the predicted function changes with the changing feature.

explainer = [Link]([Link], X100)


shap_values = explainer(X)

# make a standard partial dependence plot


sample_ind = 18
shap.partial_dependence_plot(
"RM", [Link], X100, model_expected_value=True,
feature_expected_value=True, ice=False,
shap_values=shap_values[sample_ind:sample_ind+1,:]
)

Permutation explainer: 507it [00:24, 15.98it/s]

From here, at a given observation 𝑥𝑖 (recall that shapley values are calculated at each observation) the
deviation of the model with respect to the model's mean (shown in the red line above) is approximately
-3.09 which is the shapley value of for this observation and feature.

[Link](shap_values[sample_ind], max_display=14)
In this notebook we took deeper dive into Shapley plots and learned that:

Shapley plots can show the impact of a feature, and is not affected by scale as linear coefficients
are
Shapley values are calculated per observation given per feature, it marginalize out other features
and looks at the change in prediction observation of the given feature with respect to the mean
Shapley plots are a fast and visual way of making complex models more interpretable.

You might also like