Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)I will share some of the recent results for improving interpretability when you already have a model (post-training interpretability) and our work on ways to test interpretability methods. Among them, I will take a deeper dive in one of my recent work – testing with concept activation vectors (TCAV) – a post-training interpretability method for complex models, such as neural network. This method provides an interpretation of a neural net’s internal state in terms of human-friendly, high-level concepts instead of low-level input features. The key idea is to view the high-dimensional internal state of a neural net as an aid, not an obstacle. We show how to use concept activation vectors (CAVs) as part of a technique, Testing with CAVs (TCAV), that uses directional derivatives to quantify the degree to which a user-defined concept is important to a classification result–for example, how sensitive a prediction of “zebra” is to the presence of stripes. Using the domain of image classification as a testing ground, we describe how CAVs may be used to explore hypotheses and generate insights for a standard image classification network as well as a medical application.
Session Summary
Interpretability Beyond Feature Attribution
MLconf 2018 San Francisco
Been Kim
Google Brain
Sr. Research Scientist
Learn more »