Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 30;20(1):e1011840.
doi: 10.1371/journal.pcbi.1011840. eCollection 2024 Jan.

A neural network model for the evolution of learning in changing environments

Affiliations

A neural network model for the evolution of learning in changing environments

Magdalena Kozielska et al. PLoS Comput Biol. .

Abstract

Learning from past experience is an important adaptation and theoretical models may help to understand its evolution. Many of the existing models study simple phenotypes and do not consider the mechanisms underlying learning while the more complex neural network models often make biologically unrealistic assumptions and rarely consider evolutionary questions. Here, we present a novel way of modelling learning using small neural networks and a simple, biology-inspired learning algorithm. Learning affects only part of the network, and it is governed by the difference between expectations and reality. We use this model to study the evolution of learning under various environmental conditions and different scenarios for the trade-off between exploration (learning) and exploitation (foraging). Efficient learning readily evolves in our individual-based simulations. However, in line with previous studies, the evolution of learning is less likely in relatively constant environments, where genetic adaptation alone can lead to efficient foraging, or in short-lived organisms that cannot afford to spend much of their lifetime on exploration. Once learning does evolve, the characteristics of the learning strategy (i.e. the duration of the learning period and the learning rate) and the average performance after learning are surprisingly little affected by the frequency and/or magnitude of environmental change. In contrast, an organism's lifespan and the distribution of resources in the environment have a clear effect on the evolved learning strategy: a shorter lifespan or a broader resource distribution lead to fewer learning episodes and larger learning rates. Interestingly, a longer learning period does not always lead to better performance, indicating that the evolved neural networks differ in the effectiveness of learning. Overall, however, we show that a biologically inspired, yet relatively simple, learning mechanism can evolve to lead to an efficient adaptation in a changing environment.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. The neural network used in this study.
Our network receives a cue C as input and it produces an output Qp that is the predicted quality of a food item emitting that cue. In this model we use a network with one input and one output (C and Qp, respectively) and two hidden layers, each with four nodes. Arrows indicate the information flow in the network. Solid arrows represent genetically hardwired connection that do not change during learning. Dashed arrows represent the weights that are genetically determined but can also change during learning (see text for more details).
Fig 2
Fig 2. Relationship between food cues and food quality at three different points in time.
Three Gaussian functions with σ = 0.25 illustrate the “environmental profile” (Eq 3) at three time points. The environment changes via a shift of the peak P of the profile.
Fig 3
Fig 3. Effect of a fixed number of learning episodes (LE) on (A) evolved network performance and (B) lifetime energy gain in different environmental regimes.
Panels in different columns corresponds to a different frequency of environmental change f, ranging from 0.01 (a change once every 100 generations) to 1.0 (a change every generation). The x-axis of each panel represents the magnitude of environmental change: the distance that the environmental quality peak moves when change occurs. 20 replicate simulations were run for each parameter combination (in all cases, lifespan = 500). Each replicate is represented by a coloured point, which corresponds to the population mean of this replicate, averaged over the last 2000 generations. The lines connect the median values of 20 replicates for different parameter settings. As expected, performance tends to increase with the number of learning episodes. However, the total amount of resources gained tends to be highest for an intermediate number of learning episodes, because a longer learning period reduces the time left for foraging.
Fig 4
Fig 4. The time course of network performance in a changing environment.
The panels show the time course of average population performance over the last 100 generations of simulations with an environmental change rate f = 0.1 (change once every 10 generations) and magnitude m = 0.4. For four values of the number of learning episodes (LE = 0, 5, 10, 20) four randomly chosen replicate simulations are shown. In the absence of learning (LE = 0), the population performance clearly drops to low levels every time the environment changes (indicated by vertical dashed lines). With an increasing number of learning episodes, the drops in performance are smaller, and performance is better throughout the simulation.
Fig 5
Fig 5. Prediction profiles of networks evolved for different durations of the learning period.
For each of the 16 populations presented in Fig 4, one network was chosen at random at the end of the simulation (when the environment had just changed) and was investigated in more detail. The plots show the environmental quality function (blue), the prediction profile (i.e., the quality predicted for each possible cue) of the network before learning had started (green) and at the end of the learning period (red). For longer learning periods, the “learned” prediction profiles (red curve) match the “true” environment quality profile (blue curve) reasonably well, even though the “innate” prediction profile (green curve) is way off target. Parameter settings as in Fig 4 (f = 0.1, m = 0.4).
Fig 6
Fig 6. Examples of evolved neural networks and their prediction profiles before and after learning.
Networks from three different replicates from simulations with LE = 20 are shown (three of the four individuals shown in Fig 5). Blue arrows correspond to excitatory connections (positive weights) and red to inhibitory connections (negative weights) The thickness of the lines is proportional to the strength of the connection. The baseline activation of each node is represented by circles with a blue edge for positive values and by diamonds with a red edge for negative values. The absolute strength of the baseline activation is given by the inner shading of the symbol, the darker the colour the larger the value. During learning only the four numbered weights can change. For example, for the network in the centre, two weights changed the strength and the type of connection (from excitatory to inhibitory—weight 2; and the other way round–weight 3) and weight 4 weakened in strength. Such relatively small changes to the network lead to a drastic change in the prediction profile (right column, plotting convention as in Fig 5). Note that the four “learning weights” of the networks tend to have smaller absolute weight values (thinner lines) than other weights. This was a common pattern for networks that evolved efficient learning (Fig C in S1 Appendix).
Fig 7
Fig 7. Joint evolution of (A) network performance, (B) duration of the learning period, and (C) learning rate for various environmental scenarios.
Parameter settings and graphical conventions are as in Fig 3. In (A), the performance of the evolved networks in the simulations in which learning was allowed to evolve is shown in turquoise. For comparison, the simulations in Fig 3A where learning was not allowed to evolve (LE = 0) are also shown (in red). (B) shows the evolved number of learning episodes. Notice that LE often evolves toward zero (i.e., learning disappears in the course of evolution) when the magnitude of change is small and environmental change is infrequent. (C) shows the evolved learning rates–different colours indicate the association between the evolved learning rate and the evolved duration of the learning period in the replicate simulations. Notice that the evolved learning rate is close to 0.5 in all simulations where learning evolved (LE > 10). When learning disappeared in the course of evolution (LE < 5), the learning rate is no longer under strong selection and can take on many different values (eight data points with LE < 5 are not visible, as the learning rate exceeds 3). The points are semi-transparent and darker spots indicate that multiple replicates evolved the same value.
Fig 8
Fig 8. The relationship between performance and the number of learning episodes in different environmental regimes.
Different columns correspond to different frequencies of environmental change (f) and different rows to different magnitudes of environmental change (m). Each point represents the average of the population mean over the last 2000 generations of a single replicate. Results of four simulations (all for f = 0.01) with low average LE (< 3) is not visible, as the average performance was below 0.80. The points are semi-transparent and darker spots indicate that multiple replicates evolved the same values.
Fig 9
Fig 9. Effect of lifespan on the evolution of learning.
For two lifespans (50 timesteps: red; 500 timesteps: blue) each panel shows the evolved relationship between the learning rate and the number of learning episodes in 40 replicate simulations. The panels correspond to different environmental regimes: the columns show three frequencies of environmental change (f) and the rows three magnitudes of change (m). Each point represents the average of the population mean over the last 2000 generations of a single replicate. For clarity, only learning rates up to 3.0 are shown; 36 data points with a learning rate above 3.0, all with a very low number of learning episodes (= no learning), are not visible.
Fig 10
Fig 10. Effect of lifespan on performance.
Results for two lifespans (50 timesteps: red; 500 timesteps: blue) and nine environmental regimes (defined by the rate f and the magnitude m of change) are shown. The average population performance in the last 2000 generations of 40 replicate simulations for each parameter set is indicated by coloured dots.
Fig 11
Fig 11. Effect of lifespan and the width of the quality distribution on the evolution of learning.
Rows correspond to different values of σ (sigma), that is, to different widths of the quality function. Graphical conventions as in the previous figures.

References

    1. West-Eberhard MJ. Developmental Plasticity and Evolution. Oxford: Oxford University Press; 2003.
    1. Ginsburg S, Jablonka E. The evolution of associative learning: A factor in the Cambrian explosion. J Theor Biol. 2010;266: 11–20. doi: 10.1016/j.jtbi.2010.06.017 - DOI - PubMed
    1. Ginsburg S, Jablonka E. Epigenetic learning in non-neural organisms. J Biosci. 2009;34: 633–646. doi: 10.1007/s12038-009-0081-8 - DOI - PubMed
    1. Corning WC, Dyal JA, Willows AOD, editors. Invertebrate Learning. New York: Springer-Verlang; 1012.
    1. Papini M. Evolution of learning. In: Seel N, editor. Encyclopedia of the Sciences of Learning. Springer; 2012. pp. 1188–1192.