Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 8:15:655840.
doi: 10.3389/fnhum.2021.655840. eCollection 2021.

A Lightweight Multi-Scale Convolutional Neural Network for P300 Decoding: Analysis of Training Strategies and Uncovering of Network Decision

Affiliations

A Lightweight Multi-Scale Convolutional Neural Network for P300 Decoding: Analysis of Training Strategies and Uncovering of Network Decision

Davide Borra et al. Front Hum Neurosci. .

Abstract

Convolutional neural networks (CNNs), which automatically learn features from raw data to approximate functions, are being increasingly applied to the end-to-end analysis of electroencephalographic (EEG) signals, especially for decoding brain states in brain-computer interfaces (BCIs). Nevertheless, CNNs introduce a large number of trainable parameters, may require long training times, and lack in interpretability of learned features. The aim of this study is to propose a CNN design for P300 decoding with emphasis on its lightweight design while guaranteeing high performance, on the effects of different training strategies, and on the use of post-hoc techniques to explain network decisions. The proposed design, named MS-EEGNet, learned temporal features in two different timescales (i.e., multi-scale, MS) in an efficient and optimized (in terms of trainable parameters) way, and was validated on three P300 datasets. The CNN was trained using different strategies (within-participant and within-session, within-participant and cross-session, leave-one-subject-out, transfer learning) and was compared with several state-of-the-art (SOA) algorithms. Furthermore, variants of the baseline MS-EEGNet were analyzed to evaluate the impact of different hyper-parameters on performance. Lastly, saliency maps were used to derive representations of the relevant spatio-temporal features that drove CNN decisions. MS-EEGNet was the lightest CNN compared with the tested SOA CNNs, despite its multiple timescales, and significantly outperformed the SOA algorithms. Post-hoc hyper-parameter analysis confirmed the benefits of the innovative aspects of MS-EEGNet. Furthermore, MS-EEGNet did benefit from transfer learning, especially using a low number of training examples, suggesting that the proposed approach could be used in BCIs to accurately decode the P300 event while reducing calibration times. Representations derived from the saliency maps matched the P300 spatio-temporal distribution, further validating the proposed decoding approach. This study, by specifically addressing the aspects of lightweight design, transfer learning, and interpretability, can contribute to advance the development of deep learning algorithms for P300-based BCIs.

Keywords: P300; brain-computer interfaces; convolutional neural networks; decision explanation; electroencephalography; transfer learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare that this study received materials from NVIDIA Corporation with the donation of the TITAN V used for this research. The provider was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

Figures

Figure 1
Figure 1
Structure of MS-EEGNet. Layers are represented by colored rectangles, reporting the layer name and main hyper-parameters. The tuple outside each rectangle represents the output shape of each layer. For all outputs except the last two (Flatten and Fully-connected + Softmax), the tuples are composed of three numbers representing the number of feature maps (channel dimension), number of spatial samples, and number of temporal samples within each map. The input layer provides an output of shape (1, C, T), as it just replicates the original input matrix with shape (C, T), providing a single feature map as output. The temporal dimension changed from T to T//32 along the entire CNN (where the symbol // indicates the floor division operator) due to average pooling operations. See sections EEG Decoding via CNNs and The Proposed Convolutional Neural Network and Its Variants for the meaning of symbols, and see Table 1 for further details.
Figure 2
Figure 2
Impact of alternative design choices of MS-EEGNet on the performance metric. The figure reports the difference between the AUC scored with the variant and the baseline design (i.e., ΔAUC = AUCvariantAUCbaseline) for each condition of the hyper-parameter (HP) tested, reported on the x-axis as “HPvariantHPbaseline.” The height of each gray bar represents the mean value across the participants of ΔAUC, while the error bar (black lines) represents the standard error of the mean. The results of Wilcoxon signed-rank tests (see section Statistics-ii) are also reported (*p < 0.05, ** p < 0.01, *** p < 0.001, corrected for multiple tests) on top of the figure.
Figure 3
Figure 3
AUC obtained with MS-EEGNet trained with the WS and TL-WS strategies for datasets 1–3 (panels A–C, respectively). Top plot in each panel: The AUC obtained in WS (white bars) is reported as a function of the percentage of training examples (reported on the x-axis), while the AUC obtained in TL-WS is reported also as a function of the number of participants (M) used to optimize the LOSO-M models (gray and hatched bars). The height of each bar represents the mean value of the performance metric across the participants, while the error bar (black lines) represents the standard error of the mean. Bottom plot in each panel: The AUC difference between the TL-WS and WS strategies (i.e., ΔAUC = AUCTLWSAUCWS) using the same percentage of training examples is reported using markers, and a red line denotes the mean value. For each percentage, a Wilcoxon signed-rank test was performed (see section Statistics-iii) to compare TL-WS vs. WS strategy, and the statistical significance is reported (*p < 0.05, **p < 0.01, ***p < 0.001, corrected for multiple tests) on top of each plot.
Figure 4
Figure 4
Grand average spatio-temporal representations. The top panels (A–C) show the grand average spatio-temporal representation of MS-EEGNet trained with the LOSO strategy using signals from datasets 1–3. Positive gradients are shown in red, while negative gradients are shown in blue. The bottom panels (D–F) show the grand average ERP for the deviant (black lines) and standard (dashed black lines) stimuli associated with the most relevant electrode (the one with the largest gradient values) for datasets 1–3.
Figure 5
Figure 5
Grand average absolute temporal representations of MS-EEGNet trained with the LOSO strategy using signals from datasets 1–3 (A–C); the mean value (black line) ± standard deviation (gray shaded areas) across participants are represented.
Figure 6
Figure 6
Grand average absolute spatial representations of MS-EEGNet trained with the LOSO strategy using signals from datasets 1–3 (A–C).
Figure 7
Figure 7
Grand average temporal and spatial absolute representations of MS-EEGNet trained on dataset 1 for a representative participant and session, adopting the LOSO, TL-WS, and WS strategies. In particular, the representations obtained using the LOSO strategy in the temporal and spatial domains are reported in (A,B), respectively. The representations obtained using the TL-WS strategy in the temporal and spatial domains are reported in (C) (colored lines) and (D–G), as the percentage of training examples of the new participant increased (15, 30, 45, 60%, from D–G). The representations obtained using the WS strategy in the temporal and spatial domains are reported in (H) (colored lines) and (I–L), as the percentage of training examples of the participant increased (15, 30, 45, 60%, from I–L). Note that in order to maintain the same scale across the strategies in the spatial absolute representations, in (D–G), the maximum gradient value represented (2.0e−1) was below the real maximum gradient value (3.3e−1), saturating the value in particular around P4.

References

    1. Amaral C., Mouga S., Simões M., Pereira H. C., Bernardino I., Quental H., et al. (2018). A feasibility clinical trial to improve social attention in Autistic Spectrum Disorder (ASD) using a brain computer interface. Front. Neurosci. 12:477. 10.3389/fnins.2018.00477 - DOI - PMC - PubMed
    1. Amaral C. P., Simões M. A., Mouga S., Andrade J., Castelo-Branco M. (2017). A novel Brain Computer Interface for classification of social joint attention in autism and comparison of 3 experimental setups: a feasibility study. J. Neurosci. Methods 290, 105–115. 10.1016/j.jneumeth.2017.07.029 - DOI - PubMed
    1. Barachant A., Bonnet S., Congedo M., Jutten C. (2012). Multiclass brain–computer interface classification by riemannian geometry. IEEE Trans. Biomed. Eng. 59, 920–928. 10.1109/TBME.2011.2172210 - DOI - PubMed
    1. Barachant A., Congedo M. (2014). A plug&play P300 BCI using information geometry. arXiv [Preprint]. arXiv:1409.0107.
    1. Benjamini Y., Hochberg Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289–300. 10.1111/j.2517-6161.1995.tb02031.x - DOI