Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Feb 5;105(3):416-434.
doi: 10.1016/j.neuron.2019.12.002.

Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks

Affiliations
Review

Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks

Uri Hasson et al. Neuron. .

Abstract

Evolution is a blind fitting process by which organisms become adapted to their environment. Does the brain use similar brute-force fitting processes to learn how to perceive and act upon the world? Recent advances in artificial neural networks have exposed the power of optimizing millions of synaptic weights over millions of observations to operate robustly in real-world contexts. These models do not learn simple, human-interpretable rules or representations of the world; rather, they use local computations to interpolate over task-relevant manifolds in a high-dimensional parameter space. Counterintuitively, similar to evolutionary processes, over-parameterized models can be simple and parsimonious, as they provide a versatile, robust solution for learning a diverse set of functions. This new family of direct-fit models present a radical challenge to many of the theoretical assumptions in psychology and neuroscience. At the same time, this shift in perspective establishes unexpected links with developmental and ecological psychology.

Keywords: evolution; experimental design; interpolation; learning; neural networks.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Direct-fit learning with dense sampling supports interpolation-based generalization. (A) An overly simplistic model will fail to fit the data. (B) The ideal-fit model will yield a good fit with few parameters in the context of data relying on a relatively simple generative process; in fact, this is the model used to generate the synthetic data (with noise) shown here. (C) An overly complex (i.e., over-parameterized) model may fixate on noise and yield an explosive overfit. Panels AC capture the “textbook” description of underfitting and overfitting. (D) Complex models such as ANNs, however, can nonetheless yield a fit that both captures the training data and generalizes well to novel data within the scope of the training sample (see Panel G and Bansal et al., 2018, for a related discussion). (E) Traditional experimentalists typically use highly-controlled data to construct rule-based, ideal-fit models with the hope that such models will generalize beyond the scope of the training set, into the extrapolation zone (real-life data). (F) Direct-fit models—like ANNs and, we argue, BNNs—rely on dense sampling to generalize using simple interpolation. Dense, exhaustive sampling of real-life events (which the field colloquially refers to as “big data”) effectively expands the interpolation zone so as to mimic idealized extrapolation. (G) A direct-fit model will generalize well to novel examples (black triangles) in the interpolation zone, but will not generalize well in the extrapolation zone.
Figure 2.
Figure 2.
ANNs can only generalize within the interpolation zone. (A) Interpolation over space: A simple ANN model with three fully connected hidden layers was trained to predict the output of sine function mapping x-axis to y-axis values. Training examples (green markers) were x values between −5 and 5 (comprising only even values). Predictions for test x values ranging from −15 to 15 (comprising only odd values) are indicated using blue markers. The ideal sine wave (from which the observations are sampled) is indicated by the black line. The model was able to generalize to new test examples not seen during training within the interpolation zone, but not within the extrapolation zone. (B) Interpolation over time: A simple recurrent ANN (LSTM) was trained to predict the sequence of forthcoming observations from a sine function. Training examples were sampled from the first half of sine wave sequences between 2.5 and 4.5 Hz. The trained model was supplied with test samples from the first half of a sequence (green markers) and predicted the subsequent values (blue markers). The model was able to generalize to new frequencies not seen during training within the interpolation zone, but not within the extrapolation zone. (C) Interpolation provides robust generalization in a complex world: Given a rich enough training set, the advantage of direct-fit interpolation-based learning becomes apparent, as the same ANN from panel A is able to learn an arbitrarily complex function (for which there is no ideal model).
Figure 3.
Figure 3.
Evolution by natural selection is a mindless optimization process by which organisms are adapted over many generations according to environmental constraints (i.e., an ecological niche). This artistic rendition of the phylogenetic tree highlights how all living organisms on earth can be traced back to the same ancestral organisms. Humans and other mammals descend from shrew-like mammals that lived more than 150 million years ago; mammals, birds, reptiles, amphibians, and fish share a common ancestor—aquatic worms that lived 600 million years ago; and all plants and animals derive from bacteria-like microorganisms that originated more than 3 billion years ago. Reproduced with permission from Leonard Eisenberg (https://www.evogeneao.com).
Figure 4.
Figure 4.
The density and diversity of training examples determines the interpolation zone and allows ANNs to approximate the regions of the face-space manifold to which they are exposed. (A) The scope of exposure may range from: controlled experimental stimuli (e.g., Guntupalli et al., 2016); to typical human exposure (Jenkins et al., 2018); to a biased sample of only Western faces (O’Toole et al., 2018); to the vast training sample supplied to FaceNet (Schroff et al., 2015). All of these are subsets of the entire face space. Note that the numbers of identities and observations indicated in panel A are crude approximations for illustrative purposes. (B) All facial variation in the world can be represented geometrically as locations on a manifold in an abstract, high-dimensional “face space” constrained by the physical properties of human physiognomy. (C) A simple schematic depiction of an ANN, which maps input images (e.g., pixel values for face images) through many hidden layers into a lower-dimensional embedding space. The network’s objective function quantifies the mismatch between the model’s output and the desired output (e.g., predicting the correct identity). Error signals are then propagated back through the network to adjust connection weights, incrementally optimizing the network to better perform the task specified by the objective function within the boundaries of the training data (interpolation zone). Note that modern ANNs have drastically more complex architectures than depicted in the schematic (e.g., convolutional layers). (D) Training an ANN such as FaceNet (Schroff et al., 2015) on a vast number of diverse face images yields an interpolation zone encompassing nearly all facial variation in the world (yellow; superhuman performance). However, training the exact same model on only Western faces will yield a constrained interpolation zone (green), and the model will generalize poorly to faces outside this interpolation zone (the “other-race effect”). When trained on a sparser sample representative of typical human exposure, the network will yield human-like performance (purple). Finally, if trained on impoverished data, the model will nonetheless interpolate well within the scope of this limited training set, but will fail to generalize beyond. The interpolation zone is a result of the density and diversity of the training sample.

References

    1. Adami C (2012). The use of information theory in evolutionary biology. Ann. N. Y. Acad. Sci 1256, 49–65. - PubMed
    1. Agrawal P, Carreira J, and Malik J (2015). Learning to see by moving. In Proc. IEEE Int. Conf. Comput. Vis, pp. 37–45.
    1. Anderson M, and Chemero A (2016). The brain evolved to guide action In The Wiley Handbook of Evolutionary Neuroscience, Shepherd SV, ed. (Chichester, England: John Wiley and Sons; ), pp. 1–20.
    1. Arcaro MJ, Schade PF, Vincent JL, Ponce CR, and Livingstone MS (2017). Seeing faces is necessary for face-domain formation. Nat. Neurosci 20, 1404–1412. - PMC - PubMed
    1. Ashby WR (1956). An Introduction to Cybernetics (London, England: Chapman and Hall; ).

Publication types