Compositional Explanations of Neurons

This blog post summarizes and discusses the paper "Compositional explanations of neurons" by Jesse Mu and Jacob Andreas (Advances in Neural Information Processing Systems 33 (2020)).

Table of Contents

Introduction and Motivation

Neural networks are known not only for their state-of-the-art performance, but also for being sub-symbolic black box approaches that lack interpretability. Human-interpretability of decisions, however, is increasingly important (1) for the industry that needs to explain decisions to customers, and (2) for the society that requires transparent and fair decisions. The field of explainable AI (XAI) aims at improving the interpretability of neural networks. Specifically, the paper “Compositional Explanations of Neurons” [1] addresses interpretability of single neuron behavior.

Previous work on "Network Dissection" [2] has already introduced atomic concepts to explain neuron behavior. The problem is, however, that atomic concepts are too simplistic and cannot explain more complex neuron behavior. Poor explanations are especially problematic (1) for the industry, which needs to justify neural network decisions, and (2) for XAI tools [11] that aim at providing sophisticated methods for neural network analysis. Moreover, without automatic tools, experts need to manually inspect models - a costly and time-intensive work.

The author’s solution is to automatically construct more complex explanations by combining atomic concepts into logical forms using composition operators. The authors show that compositional explanations better approximate neuron behavior, especially in Computer Vision (CV) and Natural Language Processing (NLP). Since the proposed method is fully automatic, it also solves the problem of expensive manual inspections of models.

Methodology and Experiments

First, the authors propose a general task- and model-independent framework for constructing compositional explanations, generalizing beyond the idea of "Network Dissection" [2] to explanations of arbitrary length. Compositional explanations are logical forms $\begin{array}{l}L \in L(C)\end{array}$ , constructed in an inductive manner by combining atomic concepts $\begin{array}{l}C\end{array}$ using $\begin{array}{l}\eta\end{array}$ -ary composition operators $\begin{array}{l}w\in\Omega_\eta\end{array}$ . Atomic concepts $\begin{array}{l}C\end{array}$ are defined in so-called probing datasets.

A good compositional explanation $\begin{array}{l}L\end{array}$ should explain how a single neuron $\begin{array}{l}n\end{array}$ behaves. To evaluate this semantic alignment between $\begin{array}{l}n\end{array}$ and $\begin{array}{l}L\end{array}$ , the authors use Intersection over Union (IoU), comparing binary neuron behavior $\begin{array}{l}M_n(x)\end{array}$ with the satisfiability of the logical form $\begin{array}{l}L(x)\end{array}$ for each sample $\begin{array}{l}x\end{array}$ in the probing dataset:

The task of finding the best compositional explanation $\begin{array}{l}L^*_n \in L(C)\end{array}$ for a neuron $\begin{array}{l}n\end{array}$ is formalized as a search task over $\begin{array}{l}L(C)\end{array}$ :

The search space $\begin{array}{l}L(C)\end{array}$ grows exponentially in formula length. To find good explanations in the large search space, the authors use beam-search with a beam width of 10, while restricting the maximum length of formulas.

Compositional Explanations in Computer Vision

Second, the authors study compositional explanations in image classification. Therefore, they choose ResNet-18 [6] trained on the places365 dataset [3]. As probing dataset, they choose the ADE20k scenes dataset [4], consisting of images $\begin{array}{l}x\end{array}$ and segmentation masks, called concept masks $\begin{array}{l}C(x)\end{array}$ . These masks are pixel-wise annotations of 1,105 unique concepts, such as cars or cats. To construct more complex logical forms $\begin{array}{l}L\in L(C)\end{array}$ , they combine binary segmentation masks using binary operators $\begin{array}{l}\Omega_2=\{and, or\}\end{array}$ and one unary operator not.

The authors analyze the 512 kernels in the last convolutional layer of ResNet-18. Therefore, they forward images $\begin{array}{l}x\end{array}$ from the probing dataset (Figure 1a) through ResNet-18 to obtain the outputs of a kernel $\begin{array}{l}n\end{array}$ , that is activation maps. Using bilinear interpolation, the activation maps are upsampled to the input dimension (Figure 1b). By applying thresholding, they transform the upsampled activation maps into binary segmentation masks, called neuron masks $\begin{array}{l}M_n(x)\end{array}$ (Figure 1c). Finally, they compare neuron masks $\begin{array}{l}M_n(x)\end{array}$ with concepts masks $\begin{array}{l}C(x)\end{array}$ (Figure 1d) and longer logical forms $\begin{array}{l}L(x)\end{array}$ (Figure 1e), to evaluate their semantic alignment for each sample $\begin{array}{l}x\end{array}$ in the probing dataset. Therefore, they calculate the Intersection over Union between the binary segmentation masks (Figure 1f).

Figure 1: Constructing compositional explanations: Example of probing unit $\begin{array}{l}n=483\end{array}$ in the last convolutional layer of ResNet-18.

Compositional Explanations in Natural Language Processing

Third, the authors study compositional explanations in the task of Natural Language Inference (NLI): Given two sentences, hypothesis (H) and premise (P), classify whether the hypothesis contradicts or entails the premise, or does neither of both (cf. Figure 2). NLI is important for information retrieval, semantic parsing, and commonsense reasoning [5].

Figure 2: Example for the task of NLI.

Atomic concepts $\begin{array}{l}C\end{array}$ are defined as binary indicator variables, indicating (1) whether P or H contain one of the 2,000 most common words, (2) whether P or H contain certain parts-of-speech (like adjectives), and (3) whether there is a 0%, 25%, 50% or 75% word-overlap between the two sentences. They use composition operators $\begin{array}{l}\Omega=\{and, or, not, neighbors\}\end{array}$ , where neighbors is a unary operator capturing the five closest words of a word (measuring distance between words by cosine similarity between their GloVe embeddings [10]). An example is $\begin{array}{l}L(\text{pre, hyp}) = \text{(NOT pre:ADJ) AND NEIGHBORS(hyp:running)}\end{array}$ , which is fulfilled if the premise does not contain any adjectives and the hypothesis contains a word similar to "running".

To evaluate logical forms, the authors use Jaccard similarity. It is computed by counting how often a neuron fires and the sentence pair fulfils the logical form, divided by the number of times either is fulfilled. A neuron is considered as firing if its post-ReLU activation is positive $\begin{array}{l}(M_n(x) := 1(f_n(x) > 0))\end{array}$ . Note that the Jaccard similarity is the same as Intersection over Union, just that the former term is more often used by the NLP-community.

Details include:

Probed model: Bidirectional LSTM from [7]
- + 2 hidden FC-layers, final linear layer and softmax
Training dataset: Stanford NLI corpus (SNLI) [5]
Probing dataset: Validation set of SNLI (10k samples)
Probed units: Last hidden layer (1,024 units)
- Only neurons that fire for at least 5% of the time

Results and Conclusions

1. Compositional explanations better approximate neuron behavior than atomic explanations

In both experiments, compositional explanations better approximate neuron behavior than atomic explanations, that is formulas of length 1 (cf. Figure 3). Experiments in CV show that formulas of length 10 provide the best interpretability in terms of IoU. Moreover, formulas of length 10 provide a statistically significant improvement over the formulas of length 1 introduced by “Network Dissection” [2].

Figure 3: Comparing maximum formula length to interpretability, measured by IoU.

In CV, the authors additionally classify 128 random compositional explanations. They found that 69% of neurons learn meaningful abstractions or specializations (cf. Figure 4). In NLI, they found that neurons fire for rather simple heuristics like (1) specific words (man, sitting, eating, sleeping), (2) words that are strongly associated with one class, or (3) if there is a word-overlapping between premise and hypothesis.

Figure 4: Examples for neurons that fire for related (green) and unrelated (red) concepts.

2. Better interpretability does not imply higher accuracy

The authors further underline that better interpretability (measured by IoU) does not imply better model accuracy, but this relation depends on the defined concepts $\begin{array}{l}C\end{array}$ : In CV, neurons that learn meaningful relations contribute to better accuracy; in NLI, neurons that fire for shallow heuristics contribute to lower accuracy (Figure 5).

Figure 5: Interpretability (measured by IoU) can be correlated or anticorrelated with performance.

3. Compositional explanations allow to change model behavior

Finally, the authors show that compositional explanations allow to construct adversarial examples by changing the input such that it validates or invalidates logical formulas of neurons that contribute most to the classification decision (Figure 6). Since they observe similar behavior in ResNet-50 [6], AlexNet [8] and DenseNet [9], compositional explanations might indicate that neurons learn biases of the training set (for example that the dataset does not contain swimming holes with blue water).

Figure 6: Invalidating logical formulas of neurons contributing most to the decision (e.g., by painting the water blue) causes misclassification (since the image is still a swimming hole).

Discussion

The authors discuss several limitations of their approach, including that their method (1) is limited to explaining behavior of single neurons, and (2) is heavily relying on probing datasets, which might not exist, might be expensive to construct, or might not contain the concepts $\begin{array}{l}C\end{array}$ of interest. Further, their experiments are limited to the last hidden layer. Finally, implementing their method requires expert knowledge.

Personal Review

The presented research is generally of high quality, because (1) it addresses an important problem as motivated in the introduction, (2) it is a novel contribution to the field of explainable AI, and (3) it has many applications in science and industry. However, I would like to address weaknesses that go beyond those already discussed:

Conceptual Weaknesses

First, the authors do not carefully separate the two ideas of approximating neuron behavior and human-interpretability, they even use these concepts interchangeably. However, better approximating neuron behavior is not the same as providing better interpretability. For example, even though longer formulas are increasing Intersection over Union, they are not necessarily more human-interpretable.

Lack of Details

Second, the descriptions of the NLI-experiments lack details, and the conclusions are rather weak as many questions remain open. For example, it remains unclear how many atomic concepts were used exactly (especially how many different POS-Tags). Unfortunately, at the time of writing this blogpost, the linked repository [1] does not contain any code, which limits reproducibility. Moreover, it would be particularly interesting to study longer formulas: For example, logical forms in NLI could incorporate entire sentences, and thus indicate overfitting.

Minor Weaknesses

Third, I observed minor weaknesses, including an ambiguous use of language. For example, they use the term “penultimate hidden layer” when instead referring to the last hidden layer. Furthermore, the framework for generating compositional explanations (Section 2) could have been formulated in a cleaner way, for example, independent of the evaluation function IoU.

Conclusion

Despite the weaknesses addressed, the paper is generally of high quality and has potential for follow-up works due to the many open research questions. Future work could for example (1) improve human-interpretability, (2) address the problem of missing probing datasets, and (3) study compositional explanations in different domains. Finally, the method is specifically interesting for the medical domain, where interpretability becomes increasingly important.

References

[1] Jesse Mu, and Jacob Andreas. "Compositional explanations of neurons." Advances in Neural Information Processing Systems 33 (2020). [PDF], [Code], [Talk]

[2] David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. "Network dissection: Quantifying interpretability of deep visual representations." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. [PDF]

[3] Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. "Places: A 10 million image database for scene recognition." IEEE transactions on pattern analysis and machine intelligence 40.6 (2017): 1452-1464. [PDF]

[4] Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, AdelaBarriuso, and Antonio Torralba. "Semantic understanding of scenes through the ade20k dataset." International Journal of Computer Vision 127.3 (2019): 302-321. [PDF]

[5] Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. "A large annotated corpus for learning natural language inference." arXiv preprint arXiv:1508.05326 (2015). [PDF]

[6] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. [PDF]

[7] Samuel R. Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, Christopher D. Manning, and Christopher Potts. "A fast unified model for parsing and sentence understanding." arXiv preprint arXiv:1603.06021 (2016). [PDF]

[8] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Communications of the ACM 60.6 (2017): 84-90. [PDF]

[9] Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. [PDF]

[10] Pennington, Jeffrey, Richard Socher, and Christopher D. Manning. "Glove: Global vectors for word representation." Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014. [PDF]

[11] Fahim Dalvi, Avery Nortonsmith, Anthony Bau, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, and James Glass. "NeuroX: A toolkit for analyzing individual neurons in neural networks." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. 2019. [PDF]

Seitenhierarchie