Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Feb 22:13:4.
doi: 10.3389/fnins.2019.00004. eCollection 2019.

REMODEL: Rethinking Deep CNN Models to Detect and Count on a NeuroSynaptic System

Affiliations

REMODEL: Rethinking Deep CNN Models to Detect and Count on a NeuroSynaptic System

Rohit Shukla et al. Front Neurosci. .

Abstract

In this work, we perform analysis of detection and counting of cars using a low-power IBM TrueNorth Neurosynaptic System. For our evaluation we looked at a publicly-available dataset that has overhead imagery of cars with context present in the image. The trained neural network for image analysis was deployed on the NS16e system using IBM's EEDN training framework. Through multiple experiments we identify the architectural bottlenecks present in TrueNorth system that does not let us deploy large neural network structures. Following these experiments we propose changes to CNN model to circumvent these architectural bottlenecks. The results of these evaluations have been compared with caffe-based implementations of standard neural networks that were deployed on a Titan-X GPU. Results showed that TrueNorth can detect cars from the dataset with 97.60% accuracy and can be used to accurately count the number of cars in the image with 69.04% accuracy. The car detection accuracy and car count (-/+ 2 error margin) accuracy are comparable to high-precision neural networks like AlexNet, GoogLeNet, and ResCeption, but show a manifold improvement in power consumption.

Keywords: IBM TrueNorth Neurosynaptic System; aerial image analysis; convolutional neural network; deep learning; neuromorphic computing; spiking neural network.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Sample images from COWC dataset (Mundhenk et al., 2016). Images are 192-by-192 pixels. For detection, (A,B), the model's goal is to detect whether a car is present in the center 48-by-48 pixels or not. Even though there are cars present in (B), the label has been set to false because there is no car in the center 48-by-48 pixels of the image. For the counting task, (C), the goal is to count the exact number of cars present in an image. The example shown in the figure has the label value “13,” since there are 13 cars in the image.
Figure 2
Figure 2
(A) NS16e hardware system that was developed by IBM (Image from Shah, 2016). (B) Single neurosynaptic core which forms the computational block of the TrueNorth chips with the details presented in Cassidy et al. (2013) and Nere (2013).
Figure 3
Figure 3
The figure describes the NS16e system setup. (A) NS16e system consists of three stages. The hybrid CPU/FPGA system performs data pre/post processing and image binarization. The computed spikes are later sent to TN chips on which the CNN has been deployed (B) An image of how splitters are used on TrueNorth for increasing a neuron's fan-out.
Figure 4
Figure 4
This figure shows the standard AlexNet neural network architecture. The numbers written on top of the blocks show the output feature dimension of that block in CNN model. (A) Shows the standard AlexNet neural network model (Krizhevsky et al., 2012). (B) Sections in the standard AlexNet neural network structure that pose a problem when trying to map it onto TrueNorth.
Figure 5
Figure 5
This figure shows the AlexNet implementation on TrueNorth. The numbers written on top of the blocks show the output feature dimension of that block in CNN model. (A) Shows the modified AlexNet architecture for TrueNorth implementation. (B) Sections in the modified AlexNet neural network structure there later fixed when trying to map the standard AlexNet onto TrueNorth. The output feature dimensions of 9th CNN layer in the proposed modified AlexNet is different for standard AlexNet model (Figure 4). This is because the 8th CNN layer in this modified layer has a padding of 1, unlike the standard AlexNet mode where the 8th CNN layer did not have any padding.
Figure 6
Figure 6
This figure shows the standard VGG-16 neural network architecture implementation. The numbers written on top of the blocks show the output feature dimension of that block in CNN model. (A) Shows the standard VGG-16 neural network model (Simonyan and Zisserman, 2014). (B) Three sections in the standard VGG-16 neural network structure that pose a problem when trying to map it onto TrueNorth.
Figure 7
Figure 7
This figure shows the standard VGG-16 neural network architecture that has been modified for TrueNorth implementation. The numbers written on top of the blocks show the output feature dimension of that block in CNN model. Similar to AlexNet, this standard VGG-16 neural network model has CNN features that have been downsampled all the way down to a one-by-one convolution using convolution kernels of size 7 x 7 and stride of 7.
Figure 8
Figure 8
This figure shows the modified VGG-16 neural network architecture for TrueNorth ns16e hardware. The numbers written on top of the blocks show the output feature dimension of that block in CNN model. (A) Shows the modified VGG-16 neural network model (1) where the input image size if kept at 224x224 pixels. (B) Shows the modified VGG-16 neural network model (2) where the input image size if kept at 192x192 pixels.
Figure 9
Figure 9
Percentage of TN chips required on NS16e system for splitters and the three CNN layers that are deployed on the hardware. These chip consumption values are for AlexNet CNN presented in Figure 5, VGG-16 CNN models that have been presented in Figures 7, 8.
Figure 10
Figure 10
Hardware savings that is achieved by replacing 3 x 3 convolution kernels in standard VGG-16 model with 1 x 1 convolution kernels. Modified VGG-16 model (1) refers to the CNN structure presented in Figure 8A, and modified VGG-16 model (2) refers to the CNN structure presented in Figure 8B. X-axis shows the convolutional layer in standard VGG-16 (Simonyan and Zisserman, 2014) CNN that originally had 3 x 3 convolution kernel, but they were replaced by 1 x 1 kernels in the modified VGG-16 (Figure 8) model for NS16e. Y-axis shows the number of chips that were consumed by the CNN layer when deployed onto NS16e.
Figure 11
Figure 11
This figure shows deeper convolutional neural network architecture for TrueNorth ns16e hardware. The numbers written on top of the blocks show the output feature dimension of that block in CNN model. These CNN models are extensions of the VGG-16 models that were proposed in Figure 8. (A) shows the deep convolutional neural network model where the input image size is kept at 224x224 pixels. (B) shows the deep convolutional neural neural network model where the input image size is kept at 192x192 pixels.
Figure 12
Figure 12
Error in estimating the label of car count vs the actual car count label. The plot compares the counting labels that were predicted with AlexNet CNN (Figure 5A) and deep modified VGG-16 model (Figure 11B). X-axis shows the range of labels associated with the counting dataset. For example, in the x-axis a value of 0–9 represents all of the counting dataset labels that were counting values in the range from 0 to 9. In (A) Y-axis plots the average error in estimating car count, and in (B) Y-axis plots the standard deviation of error in estimating car count.
Figure 13
Figure 13
Convolutional neural network structures trained using EEDN for COWC dataset. The numbers written on top of the blocks show the output feature dimension of that block in CNN model. (A–F) shows different design decisions for all of the six CNN models. Each of the proposed CNN model either has (1) different input image size, or (2) different output feature count for first four convolutional layers, or (3) different number of pooling layers (CNN models 4α and 4β). (A–D) and (F) are all 23-layered CNN models, and the final layer serves as softmax loss function. (D) and (E) are meant for comparison with prior approach to model CNNs. (E) is a 19-layered CNN model, and in this structure we do not downsample the image features to a 1 × 1 patch.
Figure 14
Figure 14
Stacked bar plot illustrating percentage chips utilized in the NS16e system by splitters and convolutional layers for different convolutional models.

Similar articles

Cited by

References

    1. Akopyan F., Sawada J., Cassidy A., Alvarez-Icaza R., Arthur J., Merolla P., et al. (2015). Truenorth: design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 34, 1537–1557. 10.1109/TCAD.2015.2474396 - DOI
    1. Alom M. Z., Josue T., Rahman M. N., Mitchell W., Yakopcic C., Taha T. M. (2018). Deep versus wide convolutional neural networks for object recognition on neuromorphic system, in 2018 International Joint Conference on Neural Networks (IJCNN) (Rio de Janeiro: ), 1–8. 10.1109/IJCNN.2018.8489635 - DOI
    1. Cao Y., Chen Y., Khosla D. (2015). Spiking deep convolutional neural networks for energy-efficient object recognition. Int. J. Comput. Vision 113, 54–66. 10.1007/s11263-014-0788-3 - DOI
    1. Cassidy A. S., Merolla P., Arthur J. V., Esser S. K., Jackson B., Alvarez-Icaza R., et al. (2013). Cognitive computing building block: A versatile and efficient digital neuron model for neurosynaptic cores, in The 2013 International Joint Conference on Neural Networks (IJCNN) (Dallas, TX: ), 1–10. 10.1109/IJCNN.2013.6707077 - DOI
    1. Clawson T. S., Ferrari S., Fuller S. B., Wood R. J. (2016). Spiking neural network (SNN) control of a flapping insect-scale robot, in 2016 IEEE 55th Conference on Decision and Control (CDC) (Las Vegas, NV: ), 3381–3388. 10.1109/CDC.2016.7798778 - DOI

LinkOut - more resources