Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)

Blogpost for the paper 'Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)'.

Written by Federico Acosta. Assisted by Mahdi Saleh, M.Sc.

Introduction

Due to the size and complexity of deep learning models, its interpretation has become a challenge- This paper introduces the method of Concept Activation Vectors (CAVs), a newly devised tool to provide an interpretation of the internal state of neural networks in concepts easily understandable by humans. The Concept Activation Vectors are part of the technique Testing with CAVs (TCAV), which quantifies how important a concept defined by a human, is to the classification process of the network.

Interpretability methods in Neural Networks

There are 2 common ways to provide interpretability to models: (1) Inherently interpretable models or (2) post process the model to gain insights of it. Since option 1 might be quite costly for a working model which already achieved high performance, the objective of the paper is to develop a post process model that correctly interprets the complex internals of the model.

The authors also set themselves 4 different goals for the develop of this model:

No expertise of ML of the user to understand the process.
Can be adapted to any concept that wants to be described.
Plug-in readiness. Works without needing to retrain or modify the ML model.
Global quantification. Can output a single measure for entire sets of examples, and not for single data inputs.

With the purpose of providing a deeper understanding of TCAV and how it achieves the goals previously described, the paper reviews the current state of the art methods and compares its results. Saliency methods, for example, is a technique that produces a map showing how important each pixel of a particular image is. Saliency maps aims to identify important regions of the image and provide a pixel-wise quantification. Nevertheless, it has 2 easily identifiable limitations: 1) A saliency map is conditioned to only one image. Therefore, it must be reviewed one by one for the different examples. 2) The user cannot control or aim towards a specific type of concept that wants to be explored. Particularly, the identification of the concepts is influenced by the subjective perception of the image by the interpreter.

Methodology

The methodology covers 2 concepts: 1. The usage of directional derivatives to quantify the sensitivity of a ML model towards human defined concepts. 2. How to calculate a quantifiable explanation (TCAV) of each tested concept over the NN.

Concept Activation Vectors (CAVs)

In order to obtain the CAVs, it is required to gather a set of examples that represent the concept that wants to be put to the test. Then, the activations of those images at a certain layer of the model L, plus the activations of an equal amount of random images, are separated by a linear classifier. The “Concept Activation Vector” (CAV) is defined as the normal to the hyperplane separating the random images, to the images representing the concept.

By using CAVs, the sensitivity of the ML model predictions is confronted with changes in inputs towards the direction of the concept. All of this at a certain layer of the NN.

The conceptual sensitivity of the model can be computed as the directional derivative SC;k;l(x): shown below:

Testing with CAVs (TCAV)

Testing with TCAV uses CAV to compute the sensitivity to specific concepts across the entire range of inputs. In the following formula it is assumed that k is a class label and Xk denotes all inputs with that label. With that in mind, the TCAV score is calculated as the following fraction:

Results

Gaining insight using TCAV

The TCAV process was first tried on GoogleNet and Inception V3. Testing on both architectures different concepts with various levels of abstraction, and also applying it on different layers of the network.

The results of the experiment can be seen in the images below.

Figure 2. TCAV concept trial applied on GoogleNET and Inception V3.

Figure 3. Results of accuracy testing in different layers. Basic concepts show a better performance at lower layers, while more abstract or complex concepts improve with layer depth.

The experiment confirms results presented in previous works (Zeiler & Fergus, 2014 [7]); Low level layers operate at lower level feature detectors (e.g., edges, colors). While the higher layers take lower layers features into account to determine higher-level features.

A controlled experiment with ground truth

The authors of the paper conducted a second experiment with the purpose of demonstrating how TCAV can be used to interpret the function learned by a neural network. Later, they compared the results with an evaluation of saliency maps.

For this experiment, they prepared some arbitrary datasets. The images on the datasets had captions written over them. Those captions where to then classified in three different noise levels, the first one 0% noisy, meaning that the captions corresponded to the label of the image. Then 30% noisy, meaning that 30% of the captions had a random text written on them. For example, images of cabs would have 30% of the time words like cat or dog written on the captions instead of cab. 100% noisy would mean that all of the captions had random things written on them. See the image below for reference:

Figure 4. Examples of test performed with different noisy captions written on dataset images

That data set was then put to the test on a trained network that classified images according to the objects inside it, and not the captions. The test concluded that according to TCAV, the concept of the image was more important than the concept of the caption. And it was also related to the accuracy of the network as seen in the image below.

Figure 5. Quantitative results of TCAV method on noisy caption dataset

The same exercise was also tried on saliency maps. But as seen in the images below, it is hard to tell whether the network is really considering the labels for the classification or not, regardless of the noisy level, and also the fact that it was not trained with a dataset that included captions.

Figure 6. Saliency maps of test performed on noisy caption image dataset

TCAV for medical application

Finally, TCAV was put to the test on the medical environment. On this example we visualize how the method responds to a network trained to classify different stages of diabetic retinopathy, here, the results of TCAV were also analyzed by an expert ophthalmologist. Green in the charts represents concepts that must be considered when diagnosing that stage of diabetic retinopathy. Red represents concepts that are not important or should be considered on that classification.

Figure 6. Quantitative results of TCAV method on diabetic retinopathy classifier

The analysis displays that for high accuracy classifications like a stage 4 diabetic retinopathy (DR), the NN is considering all the concepts that are important for doctors. On the other hand, on level 1 DR, where the accuracy is a bit lower, is evident that the network is considering the concept of Aneurysms (HMA), which is a concept that is not very relevant for that stage of DR, but for a number 2 stage. This proves that TCAV might also be helpful understanding the model’s errors and how to fix them.

Conclusion and future work

The paper presents two main conclusions after their results from the various experiments done with TCAV:

TCAV is a step-forward into creating an easy way (human friendly) to understand the internal state of DL models
During the paper is proven that CAVs do indeed correspond to their intended concepts, and they can be used in standard image classification methods and for specialized medical applications.

In the future the following approaches are proposed to continue exploring the method:

TCAV is yet to be used on audio, video, sequences and so on, and it may yield new insights on those domains.
Concept attribution on super-human performance nets, could even help humans improve their own abilities.

TCAV might bring in the future insights about the performance of classification methods that were probably not considered by human experts. Taking that into account, we might see this method as an aid for the performance of human experts, giving “advice”

References

[1] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, Rory Sayres. "Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)." arXiv:1711.11279 (Jun - 2018).

[2] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, Rory Sayres. "Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)." ICML 2018 - slides).

[3] Simonyan, Karen, Andrea Vedaldi, and Andrew Zisserman. "Deep inside convolutional networks: Visualising image classification models and saliency maps." arXiv preprint arXiv:1312.6034 (2013).

[4] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna. “Rethinking the Inception Architecture for Computer Vision. arXiv:1512.00567v3 (Dec - 2015)

[5] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich. “Going Deeper with Convolutions”. arXiv:1409.4842 (Sept - 2014)

[6] Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schütt, Sven Dähne, Dumitru Erhan, Been Kim. “The (Un)reliability of saliency methods”. or arXiv:1711.00867v1 (Nov - 2017)

[7] Zeiler, Matthew D and Fergus, Rob. Visualizing and understanding convolutional networks. In European conference on computer vision, pp. 818–833. Springer, 2014.

Seitenhierarchie