Understanding the role of individual units in a deep neural network

David Bau¹, Jun-Yan Zhu^{2

3}, Hendrik Strobelt⁴, Agata Lapedriza^{5

6}, Bolei Zhou⁷, Antonio Torralba²

Affiliations

¹ Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139; davidbau@csail.mit.edu.
² Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139.
³ Adobe Research, Adobe Inc., San Jose, CA 95110.
⁴ Massachusetts Institute of Technology-International Business Machines (IBM) Watson Artificial Intelligence Laboratory, Cambridge, MA 02142.
⁵ Media Lab, Massachusetts Institute of Technology, Cambridge, MA 02139.
⁶ Estudis d'Informàtica, Multimèdia i Telecomunicació, Universitat Oberta de Catalunya, 08018 Barcelona, Spain.
⁷ Department of Information Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.

PMID: 32873639
PMCID: PMC7720226
DOI: 10.1073/pnas.1907375117

Understanding the role of individual units in a deep neural network

David Bau et al. Proc Natl Acad Sci U S A. 2020.

. 2020 Dec 1;117(48):30071-30078.

doi: 10.1073/pnas.1907375117. Epub 2020 Sep 1.

Authors

David Bau¹, Jun-Yan Zhu^{2

3}, Hendrik Strobelt⁴, Agata Lapedriza^{5

6}, Bolei Zhou⁷, Antonio Torralba²

Affiliations

¹ Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139; davidbau@csail.mit.edu.
² Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139.
³ Adobe Research, Adobe Inc., San Jose, CA 95110.
⁴ Massachusetts Institute of Technology-International Business Machines (IBM) Watson Artificial Intelligence Laboratory, Cambridge, MA 02142.
⁵ Media Lab, Massachusetts Institute of Technology, Cambridge, MA 02139.
⁶ Estudis d'Informàtica, Multimèdia i Telecomunicació, Universitat Oberta de Catalunya, 08018 Barcelona, Spain.
⁷ Department of Information Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.

PMID: 32873639
PMCID: PMC7720226
DOI: 10.1073/pnas.1907375117

Abstract

Deep neural networks excel at finding hierarchical representations that solve complex tasks over large datasets. How can we humans understand these learned representations? In this work, we present network dissection, an analytic framework to systematically identify the semantics of individual hidden units within image classification and image generation networks. First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts. We find evidence that the network has learned many object classes that play crucial roles in classifying scene classes. Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes. By analyzing changes made when small sets of units are activated or deactivated, we find that objects can be added and removed from the output scenes while adapting to the context. Finally, we apply our analytic framework to understanding adversarial attacks and to semantic image editing.

Keywords: computer vision; deep networks; machine learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

**Fig. 1.**
The emergence of single-unit object detectors within a VGG-16 scene classifier. (A) VGG-16 consists of 13 convolutional layers, conv1_1 through conv5_3, followed by three fully connected layers: fc6, -7, -8. (B) The activation of a single filter on an input image can be visualized as the region where the filter activates beyond its top 1% quantile level. (C) Single units are scored by matching high-activating regions against a set of human-interpretable visual concepts; each unit is labeled with its best-matching concept and visualized with maximally activating images. (D) Concepts that match units in the final convolutional layer are summarized, showing a broad diversity of detectors for objects, object parts, materials, and colors. Many concepts are associated with multiple units. (E) Comparing all of the layers of the network reveals that most object detectors emerge at the last convolutional layers. (F) Although the training set contains no object labels, unit 150 emerges as an airplane object detector that activates much more strongly on airplane objects than nonairplane objects, as tested against a dataset of labeled object images not previously seen by the network. The jitter plot shows peak activations for the unit on randomly sampled 1,000 airplane and 1,000 nonairplane Imagenet images, and the curves show the kernel density estimates of these activations.

**Fig. 2.**
A few units play important roles in classification performance. (A) The four conv5_3 units cause the most damage to balanced classification accuracy for ski resort when each unit is individually removed from the network; dissection reveals that these most-important units detect visual concepts that are salient to ski resorts. Accuracy lost (acc lost) is measured on both training data and held-out validation (val) data. (B) When the most-important units to the class are removed all together, balanced single-class accuracy drops to near-chance levels. When the 492 least-important units in conv5_3 are removed all together (leaving only the 20 most-important units), accuracy remains high. (C) The effect on ski resort prediction accuracy when removing sets of units of successively larger sizes. These units are sorted in ascending and descending order of individual unit’s impact on accuracy. (D) Repeating the experiment for each of 365 scene classes. Each point plots single-class classification accuracy in one of three settings: the original network, the network after removing the 20 units most important to the class, and with all conv5_3 units removed except the 20 most-important ones. On the $y$ axis, classes are ordered alphabetically. (E) The relationship between unit importance and interpretability. Units that are among the top four important units for more classes are also closer matches for semantic concepts as measured by ${IoU}_{u, c}$ .

**Fig. 3.**
The emergence of object- and part-specific units within a Progressive GAN generator (19). (A) The analyzed Progressive GAN consists of 15 convolutional layers that transform a random input vector into a synthesized image of a kitchen. (B) A single filter is visualized as the region of the output image where the filter activates beyond its top 1% quantile level; note that the filters are all precursors to the output. (C) Dissecting all of the layers of the network shows a peak in object-specific units at layer 5 of the network. (D) A detailed examination of layer 5 shows more part-specific units than objects and many visual concepts corresponding to multiple units. (E) Units do not correspond to exact pixel patterns: A wide range of visual appearances for ovens and chairs is generated when an oven or chair part unit is activated. (F) When a unit specific to window parts is tested as a classifier, on average the unit activates more strongly on generated images that contain large windows than images that do not. The jitter plot shows the peak activation of unit 314 on 800 generated images that have windows larger than 5% of the image area as estimated by a segmentation algorithm and 800 generated images that do not. (G) Some counterexamples: images for which unit 314 does not activate but where windows are synthesized nevertheless.

**Fig. 4.**
The causal effect of altering units within a GAN generator. (A) When successively larger sets of units are removed from a GAN trained to generate outdoor church scenes, the tree area of the generated images is reduced. Removing 20 tree units removes more than half the generated tree pixels from the output. (B) Qualitative results: Removing tree units affects trees while leaving other objects intact. Building parts that were previously occluded by trees are rendered as if revealing the objects that were behind the trees. (C) Doors can be added to buildings by activating 20 door units. The location, shape, size, and style of the rendered door depend on the location of the activated units. The same activation levels produce different doors or no door at all (case 4) depending on locations. (D) Similar context dependence can be seen quantitatively: doors can be added in reasonable locations, such as at the location of a window, but not in abnormal locations, such as on a tree or in the sky.

**Fig. 5.**
Application: Visualizing an adversarial attack. (A) The test image is correctly labeled as a ski resort, but when an adversarial perturbation is added, the visually indistinguishable result is classified as a bedroom. (B) Visualization of the attack on the four most important units to the ski resort class and the four units most important to the bedroom class. Areas of maximum increase and decrease are shown; $Δ peak$ indicates the change in the peak activation level for the unit. (C) Over 1,000 images attacked to misclassify images to various incorrect target classes. The units that are changed most are those that dissection has identified as most important to the source and target classes. Mean absolute value change in peak unit activation is graphed, with 99% CIs shown.

**Fig. 6.**
Application: Painting by manipulating GAN neurons. (A) An interactive interface allows a user to choose several high-level semantic visual concepts and paint them onto an image. Each concept corresponds to 20 units in the GAN. (B) After the user adds a dome in the specified location, the result is a modified image in which a dome has been added in place of the original steeple. After the user’s high-level intent has been expressed by changing 20 dome units, the generator automatically handles the pixel-level details of how to fit together objects to keep the output scene realistic.

See this image and copyright information in PMC

References

1. Zhou B., Khosla A., Lapedriza A., Oliva A., Torralba A., Object detectors emerge in deep scene CNNs. arXiv:1412.6856 (22 December 2014).
1. Zeiler M. D., Fergus R., “Visualizing and understanding convolutional networks” in European Conference on Computer Vision (Springer, Berlin, Germany, 2014), pp. 818–833.
1. Mahendran A., Vedaldi A., “Understanding deep image representations by inverting them” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, New York, NY, 2015), pp. 5188–5196.
1. Olah C., et al. , The building blocks of interpretability. Distill 3, e10 (2018).
1. Bau A., et al. , Identifying and controlling important neurons in neural machine translation. https://openreview.net/pdf?id=H1z-PsR5KX. Accessed 24 August 2020.

Publication types

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Understanding the role of individual units in a deep neural network

Affiliations

Understanding the role of individual units in a deep neural network

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

LinkOut - more resources

Full Text Sources

Other Literature Sources