Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 May;102(5):10.1109/JPROC.2014.2314297.
doi: 10.1109/JPROC.2014.2314297.

Prospective Optimization

Affiliations

Prospective Optimization

Terrence J Sejnowski et al. Proc IEEE Inst Electr Electron Eng. 2014 May.

Abstract

Human performance approaches that of an ideal observer and optimal actor in some perceptual and motor tasks. These optimal abilities depend on the capacity of the cerebral cortex to store an immense amount of information and to flexibly make rapid decisions. However, behavior only approaches these limits after a long period of learning while the cerebral cortex interacts with the basal ganglia, an ancient part of the vertebrate brain that is responsible for learning sequences of actions directed toward achieving goals. Progress has been made in understanding the algorithms used by the brain during reinforcement learning, which is an online approximation of dynamic programming. Humans also make plans that depend on past experience by simulating different scenarios, which is called prospective optimization. The same brain structures in the cortex and basal ganglia that are active online during optimal behavior are also active offline during prospective optimization. The emergence of general principles and algorithms for goal-directed behavior has consequences for the development of autonomous devices in engineering applications.

Keywords: Basal ganglia; cerebral cortex; classical conditioning; dynamic programming; hippocampus; ideal observer; limbic system; optimization; reinforcement learning; temporal-difference learning.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Hidden target task. (a) Blank screen is superimposed with the hidden target distribution that is learned over the session as well as sample eye traces from three trials for a participant. The first fixation of each trial is marked with a black circle. The final and rewarded fixation is marked by a shaded gray-scale circle. (b) The region of the screen sampled with fixation shrinks from the entire screen on early trials (blue circles; 87 fixations over the first five trials) to a region that approximates the size and position of the Gaussian-integer-distributed target locations on later trials (red circles; 85 fixations from trials 32–39). (c) Learning curves. The distance between the mean of the fixation cluster for each trial to the target centroid, averaged across participants, is shown in blue and green and indicates the result of 200 simulations of the reinforcement-learning model for each participant’s parameters. The standard error of the mean is given for both. The ideal-observer prediction is indicated by the black dotted line. (d) The standard deviation of the eye position distributions or “search spread” is shown for the average of all participants (blue) and the reinforcement-learning model (green) with standard error of the mean. The dashed line is the ideal-observer theoretical optimum in each case, assuming perfect knowledge of the target distribution. (Adapted from [37].)
Fig. 2
Fig. 2
Brain structures comprising the (left) basal ganglia and (right) limbic system. The cerebral cortex shown on the outside of the brain projects to the caudate and putamen of the dorsal (top) striatum, and nucleus accumbens of the ventral (bottom) striatum of the basal ganglia. The output of the basal ganglia from the globus pallidus projects to the thalamus, which then project back to the cortex, forming a loop (see Fig. 3). The limbic system, which means “ring” and circles the thalamus, regulates emotion, behavior, motivation, long-term memory, and olfaction. The limbic system includes the cingulate cortex on the inside, or medial wall of the cortex, the hippocampus and the amygdala. (Courtesy of Paul Wissmann.)
Fig. 3
Fig. 3
Schematic model of cortico–striatal loops. (Left) Model of the basal ganglia showing the direct pathway—which involves direct striatonigral inhibitory connections (dark green arrows) that promote behavior—and the indirect pathway—which involves relays in the external globus pallidus (GPe) and sub thalamic nucleus (STN), with the only excitatory projection in the basal ganglia (red arrow), and suppresses behavior. The balance between these two projections is thought to be regulated by afferent dopaminergic signals from the substantia nigra pars compacta (SNc)and the ventral tegmental area (VTA). (Topright)The connections between the cerebral cortex and the basal ganglia can be viewed as a series of parallel-projecting, largely segregated loops or channels conveying limbic (red), associative (yellow–green) and sensorimotor (blue–white) information. Functional territories represented at the level of cerebral cortex are maintained throughout the basal ganglia nuclei and thalamic relays. Black arrows indicate excitatory glutamatergic projections, gray arrows indicate GABA-ergic projections. (Bottom right) The spatially segregated “rostral caudal gradient” of human prefrontal cortical connectivity in the caudate, putamen, and pallidum. The color-coded ring denotes limbic (red), associative (yellow–green) and sensorimotor regions of the cerebral cortex in the sagittal plane. PFC: prefrontal cortex. (Adapted from [56].)
Fig. 4
Fig. 4
Organization of cortico–basal ganglia networks. Schematic illustration showing cortico–basal ganglia networks in relation to serial adaptation. A shift from the associative to the sensorimotor cortico–basal ganglia network is observed during habit formation. DA: dopamine; DLS: dorsolateral striatum; DMS: dorsomedial striatum. (Adapted from [57].)

References

    1. Bellman RE. Dynamic Programming. Princeton, NJ, USA: Princeton Univ. Press; 1957.
    1. Rescorla RA, Wagner AR. A Theory of Pavlovian Conditioning: Variations in the Effectiveness of Reinforcement and Nonreinforcement. New York, NY, USA: Appleton-Century-Crofts; 1972.
    1. Sutton RS, Barto AG. Toward a modern theory of adaptive networks: Expectation and prediction. Psychol. Rev. 1981 Mar;88:135–170. - PubMed
    1. Sutton RS. Ph.D. dissertation. Massachusetts, Amherst, MA, USA: Dept. Comput. Sci., Univ.; 1984. Temporal credit assignment in reinforcement learning.
    1. Barto AG, Sutton RS, Watkins CJCH. Learning and Computational Neuroscience. Cambridge, MA, USA: MIT Press; 1990. Learning and sequential decision making; pp. 539–602.

LinkOut - more resources