Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Aug 5:14:637.
doi: 10.3389/fnins.2020.00637. eCollection 2020.

Hand-Gesture Recognition Based on EMG and Event-Based Camera Sensor Fusion: A Benchmark in Neuromorphic Computing

Affiliations

Hand-Gesture Recognition Based on EMG and Event-Based Camera Sensor Fusion: A Benchmark in Neuromorphic Computing

Enea Ceolini et al. Front Neurosci. .

Abstract

Hand gestures are a form of non-verbal communication used by individuals in conjunction with speech to communicate. Nowadays, with the increasing use of technology, hand-gesture recognition is considered to be an important aspect of Human-Machine Interaction (HMI), allowing the machine to capture and interpret the user's intent and to respond accordingly. The ability to discriminate between human gestures can help in several applications, such as assisted living, healthcare, neuro-rehabilitation, and sports. Recently, multi-sensor data fusion mechanisms have been investigated to improve discrimination accuracy. In this paper, we present a sensor fusion framework that integrates complementary systems: the electromyography (EMG) signal from muscles and visual information. This multi-sensor approach, while improving accuracy and robustness, introduces the disadvantage of high computational cost, which grows exponentially with the number of sensors and the number of measurements. Furthermore, this huge amount of data to process can affect the classification latency which can be crucial in real-case scenarios, such as prosthetic control. Neuromorphic technologies can be deployed to overcome these limitations since they allow real-time processing in parallel at low power consumption. In this paper, we present a fully neuromorphic sensor fusion approach for hand-gesture recognition comprised of an event-based vision sensor and three different neuromorphic processors. In particular, we used the event-based camera, called DVS, and two neuromorphic platforms, Loihi and ODIN + MorphIC. The EMG signals were recorded using traditional electrodes and then converted into spikes to be fed into the chips. We collected a dataset of five gestures from sign language where visual and electromyography signals are synchronized. We compared a fully neuromorphic approach to a baseline implemented using traditional machine learning approaches on a portable GPU system. According to the chip's constraints, we designed specific spiking neural networks (SNNs) for sensor fusion that showed classification accuracy comparable to the software baseline. These neuromorphic alternatives have increased inference time, between 20 and 40%, with respect to the GPU system but have a significantly smaller energy-delay product (EDP) which makes them between 30× and 600× more efficient. The proposed work represents a new benchmark that moves neuromorphic computing toward a real-world scenario.

Keywords: electromyography (EMG) signal processing; event-based camera; hand-gesture classification; neuromorphic engineering; sensor fusion; spiking neural networks (SNNs).

PubMed Disclaimer

Figures

Figure 1
Figure 1
Example, for a gesture “elle,” of spike streams for DVS (left) and EMG (right). In the EMG figure the spikes are represented by dots while the continuous line is the raw EMG. Different channels have different colors.
Figure 2
Figure 2
System overview. From left to right: (A) data collection setup featuring the DVS, the traditional camera and the subject wearing the EMG armband sensor, (B) data streams of (b1) DVS and (b2) EMG transformed into spikes via the Delta modulation approach, (C) the two neuromorphic systems namely (c1) Loihi and (c2) ODIN + MorphIC, (D) the hand gestures that the system is able to recognize in real time.
Figure 3
Figure 3
Architectures of the neural networks implemented on the neuromorphic systems and used in the baselines. (A) CNN architecture implemented on Loihi; the corresponding baseline CNN receives APS frames instead of DVS events. (B) subMLP architectures implemented on MorphIC, the corresponding baseline subMLPs receive APS frames instead of DVS events. (C) MLP architecture for the EMG data implemented on Loihi (c1) and on ODIN (c2), the corresponding baseline MLPs receive EMG features instead of EMG events. The shading indicates those layers that are concatenated during the fusion of the networks.
Figure 4
Figure 4
Accuracy vs. stimulus duration for the Loihi system and its software baseline counterpart. In green the results for the CNN (GPU), in purple the results for the spiking CNN (Loihi). No classification is present for APS frames before 25 ms since the frame rate is 20 fps.
Figure 5
Figure 5
Accuracy vs. stimulus duration for the ODIN + MorphIC system and its software baseline counterpart. In blue the results for the MLP (GPU), in red the results for the spiking MLP (ODIN + MorphIC). No classification is present for APS frames before 25 ms since the frame rate is 20 fps.
Figure 6
Figure 6
Comparison between the two neuromorphic system with respect to (A) energy delay product (EDP) (see section 1), (B) number of synaptic operations (SOPs) (see section 2.3.1), (C) EDP normalized by the number of SOPs.

Similar articles

Cited by

References

    1. Amir A., Taba B., Berg D., Melano T., McKinstry J., Nolfo C. D., et al. (2017). “A low power, fully event-based gesture recognition system,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Honolulu, HI: ), 7388–7397. 10.1109/CVPR.2017.781 - DOI
    1. Anumula J., Neil D., Delbruck T., Liu S.-C. (2018). Feature representations for neuromorphic audio spike streams. Front. Neurosci. 12:23. 10.3389/fnins.2018.00023 - DOI - PMC - PubMed
    1. Atzori M., Gijsberts A., Castellini C., Caputo B., Hager A.-G. M., Elsig S., et al. . (2014). Electromyography data for non-invasive naturally-controlled robotic hand prostheses. Sci. Data 1:140053. 10.1038/sdata.2014.53 - DOI - PMC - PubMed
    1. Barker J., Marxer R., Vincent E., Watanabe S. (2015). “The third ‘chime’ speech separation and recognition challenge: dataset, task and baselines,” in 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) (Scottsdale, AZ: ), 504–511. 10.1109/ASRU.2015.7404837 - DOI
    1. Benatti S., Casamassima F., Milosevic B., Farella E., Schönle P., Fateh S., et al. . (2015). A versatile embedded platform for emg acquisition and gesture recognition. IEEE Trans. Biomed. Circuits Syst. 9, 620–630. 10.1109/TBCAS.2015.2476555 - DOI - PubMed

LinkOut - more resources