Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 1;23(13):6077.
doi: 10.3390/s23136077.

k-Fold Cross-Validation Can Significantly Over-Estimate True Classification Accuracy in Common EEG-Based Passive BCI Experimental Designs: An Empirical Investigation

Affiliations

k-Fold Cross-Validation Can Significantly Over-Estimate True Classification Accuracy in Common EEG-Based Passive BCI Experimental Designs: An Empirical Investigation

Jacob White et al. Sensors (Basel). .

Abstract

In passive BCI studies, a common approach is to collect data from mental states of interest during relatively long trials and divide these trials into shorter "epochs" to serve as individual samples in classification. While it is known that using k-fold cross-validation (CV) in this scenario can result in unreliable estimates of mental state separability (due to autocorrelation in the samples derived from the same trial), k-fold CV is still commonly used and reported in passive BCI studies. What is not known is the extent to which k-fold CV misrepresents true mental state separability. This makes it difficult to interpret the results of studies that use it. Furthermore, if the seriousness of the problem were clearly known, perhaps more researchers would be aware that they should avoid it. In this work, a novel experiment explored how the degree of correlation among samples within a class affects EEG-based mental state classification accuracy estimated by k-fold CV. Results were compared to a ground-truth (GT) accuracy and to "block-wise" CV, an alternative to k-fold which is purported to alleviate the autocorrelation issues. Factors such as the degree of true class separability and the feature set and classifier used were also explored. The results show that, under some conditions, k-fold CV inflated the GT classification accuracy by up to 25%, but block-wise CV underestimated the GT accuracy by as much as 11%. It is our recommendation that the number of samples derived from the same trial should be reduced whenever possible in single-subject analysis, and that both the k-fold and block-wise CV results are reported.

Keywords: EEG; cross validation; passive brain–computer interface; time-series.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest, and the funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Figures

Figure 1
Figure 1
BCI system overview.
Figure 2
Figure 2
(a) An example of k-fold CV; epochs from a single trial end up being mixed into both the training and testing sets. (b) An example of block-wise CV; by not breaking up the trial structure, epochs from a given trial remain exclusively in either the training or testing set.
Figure 3
Figure 3
Comparison of true-labeled classification accuracies when using k-fold and blocked CV. +/0, /0, and +/ are the positive/neutral, negative/neutral, and positive/negative classifications, respectively.
Figure 4
Figure 4
Comparison of randomized-labeled classification accuracies of k-fold and blocked CV. +/0, /0, and +/ are the positive/neutral, negative/neutral, and positive/negative classifications, respectively.
Figure 5
Figure 5
Overview of the experimental protocol for recording multiple trial lengths.
Figure 6
Figure 6
SVM bandpower accuracies for true and random labelled cross validations.

References

    1. Zander T.O., Kothe C. Towards passive brain–computer interfaces: Applying brain–computer interface technology to human–machine systems in general. J. Neural Eng. 2011;8:025005. doi: 10.1088/1741-2560/8/2/025005. - DOI - PubMed
    1. Berger H. Über das Elektrenkephalogramm des Menschen. Arch. Psychiatr. 1929;87:527–570. doi: 10.1007/BF01797193. - DOI
    1. Guger C., Allison B.Z., Mrachacz-Kersting N. Recent Advances in Brain-Computer Interface Research—A Summary of the 2017 BCI Award and BCI Research Trends. In: Guger C., Mrachacz-Kersting N., Allison B.Z., editors. Brain-Computer Interface Research: A State-of-the-Art Summary 7. Springer International Publishing; Cham, Switzerland: 2019. pp. 115–127.
    1. Roberts D.R., Bahn V., Ciuti S., Boyce M.S., Elith J., Guillera-Arroita G., Hauenstein S., Lahoz-Monfort J.J., Schröder B., Thuiller W., et al. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography. 2017;40:913–929. doi: 10.1111/ecog.02881. - DOI
    1. Li R., Johansen J.S., Ahmed H., Ilyevsky T.V., Wilbur R.B., Bharadwaj H.M., Siskind J.M. Training on the test set? An analysis of Spampinato et al. [31] arXiv. 20181812.07697

LinkOut - more resources