Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Nov 15;14(11):e1002577.
doi: 10.1371/journal.pbio.1002577. eCollection 2016 Nov.

Prediction Errors but Not Sharpened Signals Simulate Multivoxel fMRI Patterns during Speech Perception

Affiliations

Prediction Errors but Not Sharpened Signals Simulate Multivoxel fMRI Patterns during Speech Perception

Helen Blank et al. PLoS Biol. .

Abstract

Successful perception depends on combining sensory input with prior knowledge. However, the underlying mechanism by which these two sources of information are combined is unknown. In speech perception, as in other domains, two functionally distinct coding schemes have been proposed for how expectations influence representation of sensory evidence. Traditional models suggest that expected features of the speech input are enhanced or sharpened via interactive activation (Sharpened Signals). Conversely, Predictive Coding suggests that expected features are suppressed so that unexpected features of the speech input (Prediction Errors) are processed further. The present work is aimed at distinguishing between these two accounts of how prior knowledge influences speech perception. By combining behavioural, univariate, and multivariate fMRI measures of how sensory detail and prior expectations influence speech perception with computational modelling, we provide evidence in favour of Prediction Error computations. Increased sensory detail and informative expectations have additive behavioural and univariate neural effects because they both improve the accuracy of word report and reduce the BOLD signal in lateral temporal lobe regions. However, sensory detail and informative expectations have interacting effects on speech representations shown by multivariate fMRI in the posterior superior temporal sulcus. When prior knowledge was absent, increased sensory detail enhanced the amount of speech information measured in superior temporal multivoxel patterns, but with informative expectations, increased sensory detail reduced the amount of measured information. Computational simulations of Sharpened Signals and Prediction Errors during speech perception could both explain these behavioural and univariate fMRI observations. However, the multivariate fMRI observations were uniquely simulated by a Prediction Error and not a Sharpened Signal model. The interaction between prior expectation and sensory detail provides evidence for a Predictive Coding account of speech perception. Our work establishes methods that can be used to distinguish representations of Prediction Error and Sharpened Signals in other perceptual domains.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1
Two computational models for how matching or neutral prior expectations influence processing of sensory signals at different levels of clarity: (A) Sharpened Signal model and (B) Prediction Error model. For both accounts, neural representations are derived by combining the sensory input with prior expectation. However, the underlying computations and information content in neural representations differ. (A) Sharpened Signal model: Prior expectation is used to multiply sensory input, leading to more specific representations for expected compared to unexpected sensory input (Sharpened Signals, SS). This leads to additive effects of sensory detail and matching prior expectation on the information content of neural representations. (B) Prediction Error model: Prior expectation is subtracted from the sensory input such that neural representations encode the difference between expected and actual input (Prediction Error, PE). This leads to an interaction between sensory detail and prior expectations, with most informative neural representations found when clearer signals follow neutral expectations, or when degraded signals match informative prior expectations. Critically, when clear signals match informative prior expectations, this produces a small and uninformative Prediction Error (Match 12-channel condition). The information content of neural representations (y-axis) contained in SS (A) and in PE (B) refers to the signal that is passed forward after the input and prior have been combined (bottom bars). This allows us to test which of these neural representations best describes measured fMRI pattern information. In each model, neural activity patterns are represented by greyscale values over sets of units. Negative Prediction Error values are shown with a white outline.
Fig 2
Fig 2. Design and experimental conditions.
We used sparse imaging to record fMRI responses while participants see written words, hear subsequently presented degraded spoken words, and say what word they heard or read previously. We used two levels of sensory detail (4- and 12-channel) for presentation of the spoken words and conditions containing different pairings of written and spoken words: (1) matching written text + spoken words (“SING” + sing); (2) neutral written text (“XXXX”) + spoken words (e.g., fork); and (3) written-only text (“PASS”). Following 1/6 of all trials, participants were cued with a question mark to say aloud the previous written or spoken word. In addition, we inserted fixation crosses, null events, and trials in which written text partially or totally mismatched with spoken words (see Materials and Methods for details).
Fig 3
Fig 3. Comparison of behavioural and univariate fMRI results with model output.
(A) Behavioural results. Matching expectations and increased sensory detail improved perception of degraded spoken words. (B) Univariate results. Mean beta values extracted from the posterior STS (pSTS, MNI: x = -52, y = -38, z = 6) show reduced BOLD signal during Match conditions (solid) in contrast to Neutral conditions (open). Error bars for the empirical data indicate standard error of the mean after between-subject variability has been removed, which is appropriate for repeated-measures comparisons [62]. (C) Main effect of prior expectations rendered on a canonical brain (p < 0.05 voxelwise FWE, n = 21). White circle marks the region of interest in the posterior STS. (D/E) Sharpened Signal model (orange) and (F/G) Prediction Error model (blue). For comparison with the behavioural results (D/F) we assessed word recognition accuracy in the model based on the final lexical representation (i.e., which word the model selected as presented), and for comparison with the univariate results (E/G) we assessed the number of activation updates required to reach the stopping criterion. Error bars for both simulations indicate the standard error of the mean over 1,000 replications. Please refer to S1 Data at https://osf.io/2ze9n/ (doi: 10.17605/OSF.IO/2ZE9N) for the numerical values underlying these figures.
Fig 4
Fig 4. Multivariate fMRI results and simulation.
(A) Hypothesized representational dissimilarity matrices. These four matrices were used to test similarity between words that share vowels within each of the four critical conditions (Match 4-channel, Neutral 4-channel, Match 12-channel, and Neutral 12-channel). Similarity between responses to identical items (on the main diagonal) was excluded, as was similarity between items in different conditions (“Not a Number” [NaN] values depicted in grey). Similarity between items containing the same vowel was predicted (zeroes in blue), whereas items containing different vowels were predicted to have more dissimilar representations (ones in red). These matrices are correlated with observed and simulated representational similarity. (B) RSA results. Fisher-z-transformed Spearman correlation coefficients for each of the four conditions in the left posterior STS (extracted from an independent ROI, [57]) show a significant interaction between sensory detail and prior expectation. Error bars indicate standard error of the mean after between-subject variability has been removed, which is appropriate for repeated-measures comparisons [62]. (C,D) Model comparison. Fisher-z-transformed Spearman correlation coefficients for each of the four conditions in the two models. (C) Sharpened Signal model (in orange) shows that both prior knowledge and sensory detail increase similarity for words that share the same vowel. (D) Prediction Error model (in blue) shows opposite effects of sensory detail in neutral and matching prior knowledge conditions, consistent with the RSA results (B). Error bars in (C) and (D) indicate standard error of the mean over 1,000 replications of these simulations. Please refer to S1 Data at https://osf.io/2ze9n/ (doi: 10.17605/OSF.IO/2ZE9N) for the numerical values underlying these figures.

References

    1. von Helmholtz H, Nagel WA. Handbuch der physiologischen Optik: L. Voss; 1909.
    1. Clark A. Whatever Next? Predictive Brains, Situated Agents, and the Future of Cognitive Science. Behav Brain Sci. 2012;36(3):181–204. - PubMed
    1. Friston K. A theory of cortical responses. Philos Trans R Soc London [Biol]. 2005;360(1456):815–36. - PMC - PubMed
    1. Arnal LH, Giraud AL. Cortical oscillations and sensory predictions. Trends Cogn Sci. 2012;16(7):390–8. 10.1016/j.tics.2012.05.003 - DOI - PubMed
    1. Summerfield C, de Lange FP. Expectation in perceptual decision making: neural and computational mechanisms. Nat Rev Neurosci. 2014;15:745–56. 10.1038/nrn3838 - DOI - PubMed