Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Mar 17;50(6):557-63.
doi: 10.1016/j.visres.2009.12.015. Epub 2010 Jan 15.

Human efficiency for classifying natural versus random text

Affiliations

Human efficiency for classifying natural versus random text

Peter Neri et al. Vision Res. .

Abstract

Humans are remarkably efficient at processing natural text. We quantified efficiency for discriminating a sample of meaningful text from a sample of random text by disrupting the meaningful sample, and measuring how much disruption human readers can tolerate before the two samples become indistinguishable. We performed these measurements for a wide range of conditions, involving samples of different lengths and containing letters, words or Chinese characters. We then compared human performance to the best possible performance achieved by a Bayesian estimator under the conditions in which we tested our participants, and in so doing we determined their absolute efficiency. Values were mostly in the range 5-40%, in agreement with reported efficiencies for many visual tasks. Although not intended as a veridical model of human processing, we found that the Bayesian model captured some (but not all) aspects of how humans classified text in our tasks and conditions.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Stimuli, task and metrics used in the experiments. (A) We generated a large database of text converted from written samples such as books and newspaper articles. The conversion only preserved the 26 letters of the alphabet and spaces between words. (B) Participants saw two strings on each trial and were asked to select the ‘target’ string (2 alternative forced choice). The ‘target’ string was extracted from the database, while the ‘non-target’ string was generated by randomly selecting elements from the database one at a time. Participants were typically able to identify the target correctly on all trials (100% correct) for this condition (leftmost point in C). We then replaced one or more elements in the ‘target’ string with randomly selected elements from the database. Randomly selected elements are indicated by black numbers. As the number of replaced elements increases (going from left to right in B and C), the percentage of correct responses decreases until it reaches chance (50% correct), shown by the psychometric curve in C. The number of replaced elements corresponding to 75% was taken as the basic threshold measurement (see Method). We repeated this measurement for different string lengths (the example in the figure is for length = 4), and plotted threshold number of replacements versus string length minus 1 (D). For all the conditions tested, points fell on a straight line in log-log units. We computed corresponding thresholds for an ideal estimator (solid line in D, see Method) and measured efficiency by taking the ratio between human and ideal thresholds. We computed average efficiency for thresholds corresponding to strings containing >5 elements (‘long’), and plotted it against efficiency for ‘short’ strings (≤ 5) for each subject (shown in E, where different points refer to different participants). Fig.2 is based on panel D and Fig. 3 is based on panel E.
Figure 2
Figure 2
Replacement thresholds for the ‘word string’ (top row), Chinese ‘character string’ (middle row) and ‘letter string’ (bottom row) conditions. Panels A, C and E show data averaged across participants (corresponding data for individual participants S1–13 is shown in B, D and F). Thresholds are plotted on the y axis against number of string elements – 1 on the×axis (refer to Figure 1D). Solid symbols for recognition task, open symbols for repetition task. Grey lines show ideal predictions for recognition (thick) and repetition (thin), not separately visible in A and C because overlapping. Error bars show ±2 s.e.m. (smaller than symbol when not visible).
Figure 3
Figure 3
Efficiency for strings containing >5 (y axis) versus ≤ elements (x axis), where each element could be an English word (black), a Chinese character (light grey) or a Western letter (dark grey). Panel B magnifies a portion of A as indicated by the joining lines. Solid symbols for recognition task, open symbols for repetition task. Different data points refer to different participants (refer to Fig. 1E). Error bars show ±1 s.e.m.

Similar articles

References

    1. Barlow HB. The efficiency of detecting changes of density in random dot patterns. Vision Research. 1978;18:637–650. - PubMed
    1. Barlow HB. The absolute efficiency of perceptual decisions. Phil. Trans. R. Soc. Lond. B. 1980;290:71–82. - PubMed
    1. Barlow HB, Tripathy SP. Correspondence noise and signal pooling in the detection of coherent visual motion. J. Neurosci. 1997;17:7954–7966. - PMC - PubMed
    1. Burgess AE, Colborne B. Visual signal detection. IV. Observer inconsistency. J. Opt. Soc. Am. A. 1988;5:617–627. - PubMed
    1. Burgess AE, Wagner RF, Jennings RJ, Barlow HB. Efficiency of human visual signal discrimination. Science. 1981;214:93–94. - PubMed

Publication types