Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr;640(8058):470-477.
doi: 10.1038/s41586-025-08829-y. Epub 2025 Apr 9.

Foundation model of neural activity predicts response to new stimulus types

Collaborators, Affiliations

Foundation model of neural activity predicts response to new stimulus types

Eric Y Wang et al. Nature. 2025 Apr.

Abstract

The complexity of neural circuits makes it challenging to decipher the brain's algorithms of intelligence. Recent breakthroughs in deep learning have produced models that accurately simulate brain activity, enhancing our understanding of the brain's computational objectives and neural coding. However, it is difficult for such models to generalize beyond their training distribution, limiting their utility. The emergence of foundation models1 trained on vast datasets has introduced a new artificial intelligence paradigm with remarkable generalization capabilities. Here we collected large amounts of neural activity from visual cortices of multiple mice and trained a foundation model to accurately predict neuronal responses to arbitrary natural videos. This model generalized to new mice with minimal training and successfully predicted responses across various new stimulus domains, such as coherent motion and noise patterns. Beyond neural response prediction, the model also accurately predicted anatomical cell types, dendritic features and neuronal connectivity within the MICrONS functional connectomics dataset2. Our work is a crucial step towards building foundation models of the brain. As neuroscience accumulates larger, multimodal datasets, foundation models will reveal statistical regularities, enable rapid adaptation to new tasks and accelerate research.

PubMed Disclaimer

Conflict of interest statement

Competing interests: A.S.T. and J.R. are co-founders of DataJoint Inc., in which they have financial interests. The other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. ANN model of the visual cortex.
Top left, an in vivo recording session of excitatory neurons from several areas (V1, LM, RL and AL) and layers (layer 2/3 (L2/3), layer 4 (L4) and layer 5 (L5)) of the mouse visual cortex. Right, the architecture of the ANN model and the flow of information from inputs (visual stimulus, eye position, locomotion and pupil size) to outputs (neural activity). Underlined labels denote the four main modules of the ANN. For the modulation and core, the stacked planes represent feature maps. For the readout, the blue boxes represent the output features of the core at the readout position of the neuron, and the fanning black lines represent readout feature weights. The top of the schematic displays the neural activity for a sampled set of neurons. In vivo and in silico responses are shown for two example neurons. Stimulus adapted from Sports-1M Dataset (Andrej Karpathy; https://cs.stanford.edu/people/karpathy/deepvideo/); copyright 2014, IEEE, reprinted with permission from IEEE Proceedings, IEEE (CC BY3.0).
Fig. 2
Fig. 2. Predictive accuracy of models trained on individual recording sessions.
a, Predictive accuracy (median CCnorm across neurons; Methods) of our model versus a previous state-of-the-art dynamic model of the mouse visual cortex (Sinz et al.). We trained and tested our model on the same set of data that were used in ref. —V1 neuronal responses to natural videos from three mice. n refers to the number of neurons per mouse. **P < 0.01, paired two-way t-test, t = 14.53, d.f. = 2. b, Predictive accuracy of our models versus the amount of data used for training for four new recording sessions and mice. For each recording session, training data were partitioned in to 7 fractions ranging from 4 min to 76 min. Separate models (diamonds) were trained on the differing fractions of training data, and all were tested on the same held-out testing data. Models of the same mice are connected by lines. c, Predictive accuracy for each visual area from models that were trained on the full data. We did not find a statistically significant relationship between predictive accuracy and visual areas (linear mixed effects model; NS, not significant by Wald test, P = 0.45, d.f. = 3).
Fig. 3
Fig. 3. Predictive accuracy of foundation models.
a, Schematic of the training and testing paradigm. Natural video data were used to train: (1) a combined model of the foundation cohort of mice with a single foundation core; and (2) foundation models versus individual models of new mice. Stimulus adapted from Sports-1M Dataset (Andrej Karpathy; https://cs.stanford.edu/people/karpathy/deepvideo/); copyright 2014, IEEE, reprinted with permission from IEEE Proceedings, IEEE (CC BY 3.0). b′–g′, The models of the new mice were tested with stimuli comprising natural videos (b′; adapted from Pixabay image (https://pixabay.com/photos/black-and-white-tunnel-the-way-1730543/; CC0 Content)), natural images (c′; adapted from Pixabay image (https://pixabay.com/photos/butterfly-insect-meadow-491166/; CC0 Content)), drifting Gabor filters (d′), flashing Gaussian dots (e′), directional pink noise (f′) and random dot kinematograms (g′). bg, Corresponding plots for b′–g′, respectively, show the predictive accuracy (median CCnorm across neurons) as a function of the amount of training data for foundation models versus individual models (grey) of the new mice (4 mice × 7 partitions of training data × 2 types of models = 56 models (diamonds)). Models of the same mouse and type (foundation or individual) are connected by lines. Number of neurons per mouse: 8,862, 8,014, 9,452 and 10,246, respectively.
Fig. 4
Fig. 4. Parametric tuning from foundation models.
a,b′,c′, Schematic of the experimental paradigm: foundation models of new mice (n = 3) were trained with natural videos, and estimates of parametric tuning were computed from in vivo and in silico responses to synthetic stimuli (directional pink noise (b′) and flashing Gaussian dots (c′)). Adapted from Sports-1M Dataset (Andrej Karpathy; https://cs.stanford.edu/people/karpathy/deepvideo/); copyright 2014, IEEE, reprinted with permission from IEEE Proceedings, IEEE (CC BY 3.0). b,c, In vivo and in silico estimates of an example neuron’s parametric tuning to orientation and direction (b) and spatial location (c). d,f,h, Binned scatter plots of in vivo and in silico estimates of orientation (d), direction (f) and spatial (h) selectivity indices. Grey colour bar indicates the number of neurons (n) in each bin. DSI, direction selectivity index; OSI, orientation selectivity index; SSI, spatial selectivity index. e,g,i, Density histograms of differences between in vivo and in silico estimates of preferred orientation (e), direction (g) and spatial location (i). Histograms containing increasingly selective groups of neurons thresholded by in silico OSI (e), DSI (g) and SSI (i) are stacked from top to bottom. Density histograms were produced via kernel density estimation using Scott’s bandwidth.
Fig. 5
Fig. 5. The foundation model of the MICrONS volume relates neuronal function to structure and anatomy.
a, Schematic of a foundation model of the MICrONS mouse, trained on excitatory neuronal responses to natural videos. At the bottom, the readout at a single time point is depicted, showing the readout positions and feature weights for two example neurons. Adapted from Sports-1M Dataset (Andrej Karpathy; https://cs.stanford.edu/people/karpathy/deepvideo/); copyright 2014, IEEE, reprinted with permission from IEEE Proceedings, IEEE (CC BY 3.0). b, Meshes of two example neurons, reconstructed from serial electron microscopy. Inset, magnified view of the indicated area, showing a synapse between these two neurons, with the pre-synaptic axon in black and the post-synaptic dendrite in grey. Scale bar, 1 μm. c, Coloured scatter plots of readout positions of all neurons from a recording session of the MICrONS mouse, overlaid on a top-down view of the recording window with annotated visual areas (V1, LM, RL and AL) and boundaries. Plots are coloured by the x (left) and y (right) coordinates of the readout positions. Scale bar, 100 μm. a.u., arbitrary units. d, Confusion matrix of MICrONS visual areas predicted from readout feature weights, normalized per row. The diagonal represents the recall for each visual area. e, Confusion matrix of MICrONS excitatory neuron cell types predicted from readout feature weights, normalized per row. The excitatory neuron cell types are from Schneider-Mizell et al.. The diagonal represents the recall for each cell type. f, Morphologies of different types of excitatory neurons. Two example neurons are shown for each excitatory neuron cell type. L5ET, layer 5 extratelencephalic-projecting neurons.
Extended Data Fig. 1
Extended Data Fig. 1. ANN perspective.
Schematic of the modeled perspective the animal. a, The retina is modeled as points on a sphere receiving light rays that trace through the origin. An example light ray with polar angle θ and azimuthal angle ϕ is shown in red. b, The light ray is traced to a point mxmy on the monitor. Bilinear interpolation of the four pixels on the monitor surrounding mxmy produces the activation of a point θϕ on the modeled retina. c, 9 examples of the modeled perspective from the left eye of an animal, with 3 horizontal rotations of the optical globe (abduction/adduction) × 3 vertical rotations (elevation/depression). The concentric circles indicate visual angles in degrees. (See Methods for details on the perspective network).
Extended Data Fig. 2
Extended Data Fig. 2. ANN modulation.
Visualization of the modulation network’s output, projected onto 2 dimensions via UMAP. a, b show the same data from an example recording session and modulation network. Each point on the plot indicates a point in time from the recording session. The colors indicate measurements of pupil size (a) and treadmill speed (b) at the respective points in time. (See Methods for details on the modulation network).
Extended Data Fig. 3
Extended Data Fig. 3. Neural network lesion studies.
To determine the effect that various components of the model have on predictive accuracy, we performed lesion studies, where we altered individual components of model and evaluated the effect that the alteration had on model performance (CCabs). The left 4 columns (a-d, f-i, k-n, p-s) are scatterplots of reference vs lesioned model performance, with each column corresponding to different mouse and each point corresponding to a neuron. The right-most column (e, j, o, t) displays density histograms of the performance difference between the reference and the lesioned models, plotted separately for each mouse, as well as the t-statistic and p-values of paired two-sided t-tests. The first row (a-e) shows the effect of the perspective module on model performance, the second row (f-j) shows the effect of the modulation module, the third row (k-o) shows the effect of the convolution type – 2D vs 3D – of the feedforward module, and the fourth row (p-t) shows the effect of the loss function – Poisson negative log likelihood (Poission NLL) vs mean square error (MSE).
Extended Data Fig. 4
Extended Data Fig. 4. ANN performance: Individual vs. Foundation.
Predictive accuracy (median CCnorm across neurons) of foundation models (with the foundation core) vs. individual models (with cores trained on individual recording sessions). For the 4 mice in the 4 left columns, 1 recording session was performed, and that data was partitioned into 7 training/validation splits, which were used to train separate individual/foundation models. The predictive accuracy of those models (diamonds) is reported for 6 testing stimulus domains (rows). For the MICrONS mouse, 14 recording sessions were performed, for each recording session, a model was trained using nearly all (99%) of the data available for training/validation. The MICrONS models were only tested on the natural movies, due to the lack of the other stimuli in the recording sessions. All models were trained only using natural movies.
Extended Data Fig. 5
Extended Data Fig. 5. Recurrent architecture: Conv-Lstm vs. CvT-Lstm.
We evaluated the performance of two different types of recurrent architectures for the core module: Conv-Lstm (blue) and CvT-Lstm (tan). For each architecture, a core was trained on 8 mice and then transferred to 4 new mice. For each of the new mice, 7 models were trained using varying amounts of natural movies, ranging from 4 to 76 minutes. The predictive accuracy (CCnorm) of these models was evaluated on 6 different stimulus domains: natural movies (a), natural images (b), drifting gabor filter (c), flashing Gaussian dots (d), directional pink noise (e), random dot kinematograms (f). Blue diamonds indicate models with the Conv-Lstm core, and tan diamonds indicate models with the CvT-Lstm core. For each architecture, models of the same mouse are connected by lines.
Extended Data Fig. 6
Extended Data Fig. 6. Pairwise similarities of readout feature weights of neurons from the MICrONS volume.
Here we examine the similarities of readout weights of same or different neurons, from same or different scans (recording sessions). In panels ac, the similarities of readout weights are plotted for the following groups: same neuron from different scan (y-axis of a), same neuron from same scan (y-axis of b), different neuron from different scan (x-axis of a, x-axis of c), different neuron from same scan (x-axis of b and y-axis of c). The similarity between readout weights was measured inversely via angular distance :=arccos((xy)/(xy))/π, where xy is a pair of readout weights. A similar pair of readout weights will exhibit a small , and vice versa. The scatterplots ac are colored by the CCmax, which is an inverse measure of neuronal noise, i.e., the estimated maximum correlation coefficient that a model could achieve at predicting the mean response the neuron (see Methods for details). For each neuron N, the ‘different’ neuron N’ was restricted to be ≤100 μm apart from each other in terms of soma distance, and the distribution of the number of ‘different’ neurons is shown in d (from different scans) and e (from the same scan). f and g (corresponding to d and e, respectively) show the fraction of the nearby neurons N’ that are more similar to N in terms of readout weights than N is to itself across different scans. f, For 919 out of the 1013 neurons N, less than 0.05 of nearby neurons N’ from different scans had more similar readout weights. g, For 840 out of the 1013 neurons N, less than 0.05 of nearby neurons N’ from the same scan had more similar readout weights.

Update of

References

    1. Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at 10.48550/arXiv.2108.07258 (2021).
    1. The MICrONS Consortium. Functional connectomics spanning multiple areas of mouse visual cortex. Nature10.1038/s41586-025-08790-w (2025). - PMC - PubMed
    1. Cadieu, C. F. et al. Deep neural networks rival the representation of primate IT cortex for core visual object recognition. PLoS Comput. Biol.10, e1003963 (2014). - PMC - PubMed
    1. Batty, E. et al. Multilayer network models of primate retinal ganglion cells. In Proc. 5th International Conference for Learning Representations (ICLR, 2017).
    1. McIntosh, L. T., Maheswaranathan, N., Nayebi, A., Ganguli, S. & Baccus, S. A. Deep learning models of the retinal response to natural scenes. Adv. Neural Inf. Process. Syst.29, 1369–1377 (2016). - PMC - PubMed

LinkOut - more resources