Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Sep 22:2023.09.18.558113.
doi: 10.1101/2023.09.18.558113.

Neural Data Transformer 2: Multi-context Pretraining for Neural Spiking Activity

Affiliations

Neural Data Transformer 2: Multi-context Pretraining for Neural Spiking Activity

Joel Ye et al. bioRxiv. .

Abstract

The neural population spiking activity recorded by intracortical brain-computer interfaces (iBCIs) contain rich structure. Current models of such spiking activity are largely prepared for individual experimental contexts, restricting data volume to that collectable within a single session and limiting the effectiveness of deep neural networks (DNNs). The purported challenge in aggregating neural spiking data is the pervasiveness of context-dependent shifts in the neural data distributions. However, large scale unsupervised pretraining by nature spans heterogeneous data, and has proven to be a fundamental recipe for successful representation learning across deep learning. We thus develop Neural Data Transformer 2 (NDT2), a spatiotemporal Transformer for neural spiking activity, and demonstrate that pretraining can leverage motor BCI datasets that span sessions, subjects, and experimental tasks. NDT2 enables rapid adaptation to novel contexts in downstream decoding tasks and opens the path to deployment of pretrained DNNs for iBCI control. Code: https://github.com/joel99/context_general_bci.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
A. NDT2 is a spatiotemporal Transformer encoder-decoder in the style of He et al. [17], operating on binned spiking activity. During masked autoencoding pretraining, a spike rate decoder reconstructs masked spikes from encoder outputs; downstream, additional decoders similarly use encoder outputs for behavior prediction (e.g. cursor movement). The encoder and decoders both receive context embeddings as inputs. These embeddings are learned vectors for each unique type of metadata, such as session or subject ID. B. NDT2 aims to enable pretraining over diverse forms of related neural data. Neural data from other sessions within a single subject are the most relevant but limited in volume, and may be complemented by broader sources of data.
Figure 2.
Figure 2.. Model Training:
A. We model neural activity from human and monkey reach. In monkey models, evaluation sessions are drawn from a self-paced reaching dataset [46]; multi-session and multi-subject models pretrain with other sessions in these data. The multi-task model pretrains with the other monkey reach data. Human models use a similar volume of data. B. A multi-session model pretrains with data from an evaluation subject, with held-out evaluation sessions. Then, for each evaluation session, we first do unsupervised tuning off the pretrained model, and then train a supervised probe off of this tuned model. C. We show which model components are learned (receive gradients) during pretraining and the two tuning stages. For example, supervised probes use an encoder that received both pretraining and tuning on a target session. All tuning is end to end.
Figure 3.
Figure 3.
NDT2 enables pretraining over multi-session, multi-subject, and multi-task data. We show unsupervised and supervised performance (mean of 5 sessions, SEM intervals of 3 seeds) on sorted (A) and unsorted (B) spiking activity. Higher is better for R2, lower is better for negative log-likelihood (NLL). Pretraining data is size-matched at 20Ks, except scratch single-session data. NDT2 improves with pretraining with all data sources, whereas stitching is ineffective. NDT1 aggregation is helpful but does not apply beyond session transfer. A reference well-tuned decoding score from the rEFH model is estimated [47].
Figure 4.
Figure 4.. Scaling of transfer on RTT.
We compare supervised R2 (A) and unsupervised NLL scaling (B) as we increase the pretraining dataset size. Each point is a model; non single-session models calibrate to evaluation sessions with 100 trials. All pretraining improves over the leftmost 100-trial single-session from-scratch model, though scaling benefits vary by data sources. C. We seek a convergence point between pretraining and training from scratch, as we increase the number of trials we use in our target context. Models converge by 3K trials.
Figure 5.
Figure 5.. Tuning and evaluating pretrained decoders.
A. In offline experiments, we compare a multisession pretrained model calibrated to a novel day against current approaches of a pretrained model without adaptation (0-Shot) and from-scratch training (yellow, orange). Both supervised and unsupervised outperforms these strategies. Standard error shown over 3 seeds. B. In human control pilot experiments, we evaluated models on 3 test days. “0-Shot” models use no data from test days. C. Average reach times in 40 center-out trials is shown for 3 decoders over 2–3 sessions. This average includes trials where target is not acquired in 10s, though this occurs 1–2 times in 40 trials. Session NDT2 uses <250 trials of data, while Broad NDT2 also includes other human and monkey data. Pretrained models provide consistent 0-Shot control while OLE [23] can sometimes fail (shown by X). Control improves with either unsupervised or supervised tuning. However, OLE appears to overtake NDT2 with supervision.

Similar articles

Cited by

References

    1. Vyas Saurabh, Golub Matthew D, Sussillo David, and Shenoy Krishna V. Computation through neural population dynamics. Annual review of neuroscience, 43:249–275, 2020. - PMC - PubMed
    1. Altan Ege, Solla Sara A., Miller Lee E., and Perreault Eric J.. Estimating the dimensionality of the manifold underlying multi-electrode neural recordings. PLOS Computational Biology, 17(11):1–23, 11 2021. doi: 10.1371/journal.pcbi.1008591. URL 10.1371/journal.pcbi.1008591. - DOI - DOI - PMC - PubMed
    1. O’Shea Daniel J, Duncker Lea, Goo Werapong, Sun Xulu, Vyas Saurabh, Trautmann Eric M, Diester Ilka, Ramakrishnan Charu, Deisseroth Karl, Sahani Maneesh, et al. Direct neural perturbations reveal a dynamical mechanism for robust computation. bioRxiv, pages 2022–12, 2022.
    1. Sadtler Patrick T, Quick Kristin M, Golub Matthew D, Chase Steven M, Ryu Stephen I, Tyler-Kabara Elizabeth C, Yu Byron M, and Batista Aaron P. Neural constraints on learning. Nature, 512(7515): 423–426, 2014. - PMC - PubMed
    1. Vyas Saurabh, Even-Chen Nir, Stavisky Sergey D, Ryu Stephen I, Nuyujukian Paul, and Shenoy Krishna V. Neural population dynamics underlying motor learning transfer. Neuron, 97(5):1177–1186, 2018. - PMC - PubMed

Publication types