Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct;97(5):1999-2021.
doi: 10.1111/brv.12879. Epub 2022 Jul 4.

A reaction norm framework for the evolution of learning: how cumulative experience shapes phenotypic plasticity

Affiliations

A reaction norm framework for the evolution of learning: how cumulative experience shapes phenotypic plasticity

Jonathan Wright et al. Biol Rev Camb Philos Soc. 2022 Oct.

Abstract

Learning is a familiar process to most people, but it currently lacks a fully developed theoretical position within evolutionary biology. Learning (memory and forgetting) involves adjustments in behaviour in response to cumulative sequences of prior experiences or exposures to environmental cues. We therefore suggest that all forms of learning (and some similar biological phenomena in development, aging, acquired immunity and acclimation) can usefully be viewed as special cases of phenotypic plasticity, and formally modelled by expanding the concept of reaction norms to include additional environmental dimensions quantifying sequences of cumulative experience (learning) and the time delays between events (forgetting). Memory therefore represents just one of a number of different internal neurological, physiological, hormonal and anatomical 'states' that mediate the carry-over effects of cumulative environmental experiences on phenotypes across different time periods. The mathematical and graphical conceptualisation of learning as plasticity within a reaction norm framework can easily accommodate a range of different ecological scenarios, closely linking statistical estimates with biological processes. Learning and non-learning plasticity interact whenever cumulative prior experience causes a modification in the reaction norm (a) elevation [mean phenotype], (b) slope [responsiveness], (c) environmental estimate error [informational memory] and/or (d) phenotypic precision [skill acquisition]. Innovation and learning new contingencies in novel (laboratory) environments can also be accommodated within this approach. A common reaction norm approach should thus encourage productive cross-fertilisation of ideas between traditional studies of learning and phenotypic plasticity. As an example, we model the evolution of plasticity with and without learning under different levels of environmental estimation error to show how learning works as a specific adaptation promoting phenotypic plasticity in temporally autocorrelated environments. Our reaction norm framework for learning and analogous biological processes provides a conceptual and mathematical structure aimed at usefully stimulating future theoretical and empirical investigations into the evolution of plasticity across a wider range of ecological contexts, while providing new interdisciplinary connections regarding learning mechanisms.

Keywords: behavioural flexibility; behavioural plasticity; developmental plasticity; habituation curves; learning rules; phenotypic equation; state-dependence.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Illustrations of a single individual's unidimensional and multidimensional reaction norms for non‐learning phenotypic plasticity in response to environmental variation (blue reaction norms), and learning plasticity as a result of a cumulative sequence of prior exposures (red reaction norms). (A) Non‐learning unidimensional plasticity as a linear response to the mean‐centred environmental variable, E1 (e.g. foraging effort with increasing prey profitability), with the elevation Y¯ representing the mean phenotypic value for that individual in its average environmental condition X¯. (B) Non‐learning multidimensional plasticity in response to two environmental variables, E1 and E2, with an interaction between them producing a warped reaction norm surface (e.g. predation threat and the need for vigilance moderating the positive effect of prey profitability on foraging effort). (C) Learning unidimensional plasticity following a particular sequence of evenly spaced prior exposures in which the behaviour decreases non‐linearly – i.e. the effect per exposure declines with increasing prior experience (e.g. exponential effects of habituation to a benign novel object near a food source). (D) Learning multidimensional plasticity with the effect of cumulative experience from a sequence of events interacting in response to some additional environmental effect, E (e.g. habituation to a benign novel object taking longer with an increasing perceived predation threat). The blue and red lines thus represent unidimensional reaction norms in A and C, and reaction norm surface values at the mean‐centred (zero) values of the environmental axes E2 in B and E in D. The darker shading of the grey reaction norm surfaces represents higher phenotypic values in B and later phenotypic expressions in D. See main text for more details, but note that the particular cases here were chosen for the purposes of illustration. In real systems, non‐learning plasticity reaction norms A and B can also be non‐linear, whilst learning reaction norms C and D can be linear, and both may involve more than two (x‐axis) environmental effects.
Fig. 2
Fig. 2
Illustrations of multidimensional curvilinear learning reaction norms showing different rates of phenotypic plasticity due to the cumulative experience of (i) a sequence of prior (reinforcing) exposures versus (ii) the length of delays between successive exposures in: (A) kin discrimination ability (e.g. in affiliation behaviour towards kin versus non‐kin) as a result of imprinting requiring usually only one or two prior exposures with little forgetting and hence no effect of time delays; (B) conditioned taste aversion requiring only a small number of prior exposures but with less effect if there are longer delays between those events; (C) habituation to a benign novel object occurring only after a long sequence of similar events and with dishabituation increasing following longer delays between such events; and (D) foraging success increasing via slow positive reinforcement (or associative) learning due to experiencing many events in a row with forgetting happening on a similar timescale following increasing delays between events without reinforcement. The darker shading in these grey learning reaction norm surfaces represents the more diminished changes in behaviour in later phenotypic expressions. See main text for details, but note that the particular cases here were chosen for the purposes of illustration. In real systems, these aspects of learning reaction norms can also be linear, and may involve more than just two environmental (x‐axis) effects.
Fig. 3
Fig. 3
Conceptual representations of different ways that ‘learning’ can affect phenotypic values in the context of non‐learning reaction norms. For the purposes of illustration, potentially multidimensional reaction norms have been simplified here into two‐dimensional representations of linear non‐learning reaction norms (in blue), with dots (in red) denoting instances of phenotypic expression. The spacing of the reaction norms and dots indicate the expected non‐linear changes over time due to learning from successive prior exposures to the environment allowing the individual to arrive gradually and asymptotically at a new pattern of phenotypic expression, with both reaction norms and dots becoming progressively darker during this process of learning. Learning can affect non‐learning reaction norm (A) elevations (mean phenotype) and/or (B) slopes (responsiveness). It can also usefully reduce the degree of error (or residual variation) in instances of phenotypic expression away from optimal reaction norms (illustrated here with just two sets of orange arrows) in either (C) the x‐dimension, as informational memory from experiencing past environments is used progressively to improve the match between the perceived environmental value on the x‐axis and the true value (see Section VIII), and/or (D) the y‐dimension, as skills learnt from prior experiences increase the accuracy or precision of the appropriate phenotypic expression given the environment. In addition, we can simplistically represent (E) innovation as an extension of the reaction norm (dashed blue line) in response to novel environmental conditions (in purple) followed by reinforcement learning (based on pay‐offs) to refine the expression of a new optimum phenotypic value, and (F) the use of similar innovative learning as a first step in a specific example of reinforcement learning of a novel experimentally imposed optimal reaction norm (grey line or purple dichotomous choice) requiring a series of appropriate learnt behavioural responses via training to a particular novel contingency (green versus yellow options). See text for further explanation.
Fig. 4
Fig. 4
Flowchart of the individual‐based simulation model procedure for phenotypic plasticity. The individual's current perception of its environment x t (grey thought‐bubble) depends upon both its memory of past cues (left path) and the information it sampled during the current time step Cue t (right path), and their relative weight (g m and 1–g m respectively). The phenotype y t is then plastically adjusted towards a match with the current perceived environment. See Appendix S1 for more details. Haploid bird illustration: Wikimedia commons.
Fig. 5
Fig. 5
Individual‐based simulation model results, showing evolved genetic values for: (A) plasticity or the slope of the reaction norm g p (g p = 1 maximises phenotype–environment matching); (B) investment in learning in terms of a memory factor g m for the use of knowledge regarding environmental conditions during previous decision events; and (C) sampling effort during the current decision or time event, g s. Results are given according to variation in the reliability of environmental cues β (i.e. how correlated they are with the fitness‐impacting environmental factor), and how temporally autocorrelated the environmental factor itself is from one decision event to the next, α. Simulations involved a population of 200 individuals with 50 decision events or time steps per lifetime. Results are shown after 1000 generations averaged across 20 replicates per grid square. In the top panel in each case, the cost of plasticity = 0.05 per unit of reaction norm slope; cost of memory = 0.05 times the proportional use of the memory factor; and cost of sampling = 0.05 per unit sampling effort. In the bottom panel, the costs of plasticity and memory are the same, but with an increased cost of sampling = 0.2. See main text for explanation and Appendix S1 for more details on the model.

References

    1. Abbott, K. R. & Sherratt, T. N. (2011). The evolution of superstition through optimal use of incomplete information. Animal Behaviour 82, 85–92.
    1. Afshar, M. & Giraldeau, L.‐A. (2014). A unified modelling approach for producer‐scrounger games in complex ecological conditions. Animal Behaviour 96, 167–176.
    1. Agrawal, A. A. (2001). Phenotypic plasticity in the interactions and evolution of species. Science 294, 321–326. - PubMed
    1. Amy, M. , van Oers, K. & Naguib, M. (2012). Worms under cover: relationships between performance in learning tasks and personality in great tits (Parus major). Animal Cognition 15, 763–770. - PubMed
    1. Angilletta, M. J. , Niewiarowski, P. H. & Navas, C. A. (2002). The evolution of thermal physiology in ectotherms. Journal of Thermal Biology 27, 249–268.

Publication types