This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2023 Sep 22:2023.09.18.558113.

doi: 10.1101/2023.09.18.558113.

Neural Data Transformer 2: Multi-context Pretraining for Neural Spiking Activity

Joel Ye^{1

2

3}, Jennifer L Collinger^{1

3

4

5

6}, Leila Wehbe^{2

3

7}, Robert Gaunt^{1

3

4

5

6}

Affiliations

¹ Rehab Neural Engineering Labs, University of Pittsburgh.
² Neuroscience Institute, Carnegie Mellon University.
³ Center for the Neural Basis of Cognition, Pittsburgh.
⁴ Department of Physical Medicine and Rehabilitation, University of Pittsburgh.
⁵ Department of Bioengineering, University of Pittsburgh.
⁶ Department of Biomedical Engineering, Carnegie Mellon University.
⁷ Machine Learning Department, Carnegie Mellon University.

PMID: 37781630
PMCID: PMC10541112
DOI: 10.1101/2023.09.18.558113

Neural Data Transformer 2: Multi-context Pretraining for Neural Spiking Activity

Joel Ye et al. bioRxiv. 2023.

[Preprint]. 2023 Sep 22:2023.09.18.558113.

doi: 10.1101/2023.09.18.558113.

Authors

Joel Ye^{1

2

3}, Jennifer L Collinger^{1

3

4

5

6}, Leila Wehbe^{2

3

7}, Robert Gaunt^{1

3

4

5

6}

Affiliations

¹ Rehab Neural Engineering Labs, University of Pittsburgh.
² Neuroscience Institute, Carnegie Mellon University.
³ Center for the Neural Basis of Cognition, Pittsburgh.
⁴ Department of Physical Medicine and Rehabilitation, University of Pittsburgh.
⁵ Department of Bioengineering, University of Pittsburgh.
⁶ Department of Biomedical Engineering, Carnegie Mellon University.
⁷ Machine Learning Department, Carnegie Mellon University.

PMID: 37781630
PMCID: PMC10541112
DOI: 10.1101/2023.09.18.558113

Abstract

The neural population spiking activity recorded by intracortical brain-computer interfaces (iBCIs) contain rich structure. Current models of such spiking activity are largely prepared for individual experimental contexts, restricting data volume to that collectable within a single session and limiting the effectiveness of deep neural networks (DNNs). The purported challenge in aggregating neural spiking data is the pervasiveness of context-dependent shifts in the neural data distributions. However, large scale unsupervised pretraining by nature spans heterogeneous data, and has proven to be a fundamental recipe for successful representation learning across deep learning. We thus develop Neural Data Transformer 2 (NDT2), a spatiotemporal Transformer for neural spiking activity, and demonstrate that pretraining can leverage motor BCI datasets that span sessions, subjects, and experimental tasks. NDT2 enables rapid adaptation to novel contexts in downstream decoding tasks and opens the path to deployment of pretrained DNNs for iBCI control. Code: https://github.com/joel99/context_general_bci.

PubMed Disclaimer

Figures

**Figure 1.**
A. NDT2 is a spatiotemporal Transformer encoder-decoder in the style of He et al. [17], operating on binned spiking activity. During masked autoencoding pretraining, a spike rate decoder reconstructs masked spikes from encoder outputs; downstream, additional decoders similarly use encoder outputs for behavior prediction (e.g. cursor movement). The encoder and decoders both receive context embeddings as inputs. These embeddings are learned vectors for each unique type of metadata, such as session or subject ID. B. NDT2 aims to enable pretraining over diverse forms of related neural data. Neural data from other sessions within a single subject are the most relevant but limited in volume, and may be complemented by broader sources of data.

**Figure 2.. Model Training:**
A. We model neural activity from human and monkey reach. In monkey models, evaluation sessions are drawn from a self-paced reaching dataset [46]; multi-session and multi-subject models pretrain with other sessions in these data. The multi-task model pretrains with the other monkey reach data. Human models use a similar volume of data. B. A multi-session model pretrains with data from an evaluation subject, with held-out evaluation sessions. Then, for each evaluation session, we first do unsupervised tuning off the pretrained model, and then train a supervised probe off of this tuned model. C. We show which model components are learned (receive gradients) during pretraining and the two tuning stages. For example, supervised probes use an encoder that received both pretraining and tuning on a target session. All tuning is end to end.

**Figure 3.**
**NDT2 enables pretraining** over multi-session, multi-subject, and multi-task data. We show unsupervised and supervised performance (mean of 5 sessions, SEM intervals of 3 seeds) on sorted **(A)** and unsorted **(B)** spiking activity. Higher is better for R², lower is better for negative log-likelihood (NLL). Pretraining data is size-matched at 20Ks, except scratch single-session data. NDT2 improves with pretraining with all data sources, whereas stitching is ineffective. NDT1 aggregation is helpful but does not apply beyond session transfer. A reference well-tuned decoding score from the rEFH model is estimated [47].

**Figure 4.. Scaling of transfer on RTT.**
We compare supervised R² **(A)** and unsupervised NLL scaling **(B)** as we increase the pretraining dataset size. Each point is a model; non single-session models calibrate to evaluation sessions with 100 trials. All pretraining improves over the leftmost 100-trial single-session from-scratch model, though scaling benefits vary by data sources. C. We seek a convergence point between pretraining and training from scratch, as we increase the number of trials we use in our target context. Models converge by 3K trials.

**Figure 5.. Tuning and evaluating pretrained decoders.**
A. In offline experiments, we compare a multisession pretrained model calibrated to a novel day against current approaches of a pretrained model without adaptation (0-Shot) and from-scratch training (yellow, orange). Both supervised and unsupervised outperforms these strategies. Standard error shown over 3 seeds. B. In human control pilot experiments, we evaluated models on 3 test days. “0-Shot” models use no data from test days. C. Average reach times in 40 center-out trials is shown for 3 decoders over 2–3 sessions. This average includes trials where target is not acquired in 10s, though this occurs 1–2 times in 40 trials. Session NDT2 uses <250 trials of data, while Broad NDT2 also includes other human and monkey data. Pretrained models provide consistent 0-Shot control while OLE [23] can sometimes fail (shown by X). Control improves with either unsupervised or supervised tuning. However, OLE appears to overtake NDT2 with supervision.

See this image and copyright information in PMC

Cited by

Multiscale fusion enhanced spiking neural network for invasive BCI neural signal decoding.
Song Y, Han L, Zhang T, Xu B. Song Y, et al. Front Neurosci. 2025 Feb 21;19:1551656. doi: 10.3389/fnins.2025.1551656. eCollection 2025. Front Neurosci. 2025. PMID: 40061257 Free PMC article.

References

1. Vyas Saurabh, Golub Matthew D, Sussillo David, and Shenoy Krishna V. Computation through neural population dynamics. Annual review of neuroscience, 43:249–275, 2020. - PMC - PubMed
1. Altan Ege, Solla Sara A., Miller Lee E., and Perreault Eric J.. Estimating the dimensionality of the manifold underlying multi-electrode neural recordings. PLOS Computational Biology, 17(11):1–23, 11 2021. doi: 10.1371/journal.pcbi.1008591. URL 10.1371/journal.pcbi.1008591. - DOI - DOI - PMC - PubMed
1. O’Shea Daniel J, Duncker Lea, Goo Werapong, Sun Xulu, Vyas Saurabh, Trautmann Eric M, Diester Ilka, Ramakrishnan Charu, Deisseroth Karl, Sahani Maneesh, et al. Direct neural perturbations reveal a dynamical mechanism for robust computation. bioRxiv, pages 2022–12, 2022.
1. Sadtler Patrick T, Quick Kristin M, Golub Matthew D, Chase Steven M, Ryu Stephen I, Tyler-Kabara Elizabeth C, Yu Byron M, and Batista Aaron P. Neural constraints on learning. Nature, 512(7515): 423–426, 2014. - PMC - PubMed
1. Vyas Saurabh, Even-Chen Nir, Stavisky Sergey D, Ryu Stephen I, Nuyujukian Paul, and Shenoy Krishna V. Neural population dynamics underlying motor learning transfer. Neuron, 97(5):1177–1186, 2018. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Neural Data Transformer 2: Multi-context Pretraining for Neural Spiking Activity

Affiliations

Neural Data Transformer 2: Multi-context Pretraining for Neural Spiking Activity

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources