Human efficiency for classifying natural versus random text

Peter Neri¹, Alicia Liu, Dennis M Levi

Affiliations

PMID: 20079757
PMCID: PMC2832918
DOI: 10.1016/j.visres.2009.12.015

Human efficiency for classifying natural versus random text

Peter Neri et al. Vision Res. 2010.

. 2010 Mar 17;50(6):557-63.

doi: 10.1016/j.visres.2009.12.015. Epub 2010 Jan 15.

Authors

Peter Neri¹, Alicia Liu, Dennis M Levi

Affiliation

¹ Institute of Medical Sciences, University of Aberdeen, Foresterhill, Aberdeen AB25 2ZD, United Kingdom. pn@white.stanford.edu

PMID: 20079757
PMCID: PMC2832918
DOI: 10.1016/j.visres.2009.12.015

Abstract

Humans are remarkably efficient at processing natural text. We quantified efficiency for discriminating a sample of meaningful text from a sample of random text by disrupting the meaningful sample, and measuring how much disruption human readers can tolerate before the two samples become indistinguishable. We performed these measurements for a wide range of conditions, involving samples of different lengths and containing letters, words or Chinese characters. We then compared human performance to the best possible performance achieved by a Bayesian estimator under the conditions in which we tested our participants, and in so doing we determined their absolute efficiency. Values were mostly in the range 5-40%, in agreement with reported efficiencies for many visual tasks. Although not intended as a veridical model of human processing, we found that the Bayesian model captured some (but not all) aspects of how humans classified text in our tasks and conditions.

PubMed Disclaimer

Figures

**Figure 1**
Stimuli, task and metrics used in the experiments. (A) We generated a large database of text converted from written samples such as books and newspaper articles. The conversion only preserved the 26 letters of the alphabet and spaces between words. (B) Participants saw two strings on each trial and were asked to select the ‘target’ string (2 alternative forced choice). The ‘target’ string was extracted from the database, while the ‘non-target’ string was generated by randomly selecting elements from the database one at a time. Participants were typically able to identify the target correctly on all trials (100% correct) for this condition (leftmost point in C). We then replaced one or more elements in the ‘target’ string with randomly selected elements from the database. Randomly selected elements are indicated by black numbers. As the number of replaced elements increases (going from left to right in B and C), the percentage of correct responses decreases until it reaches chance (50% correct), shown by the psychometric curve in C. The number of replaced elements corresponding to 75% was taken as the basic threshold measurement (see Method). We repeated this measurement for different string lengths (the example in the figure is for length = 4), and plotted threshold number of replacements versus string length minus 1 (D). For all the conditions tested, points fell on a straight line in log-log units. We computed corresponding thresholds for an ideal estimator (solid line in D, see Method) and measured efficiency by taking the ratio between human and ideal thresholds. We computed average efficiency for thresholds corresponding to strings containing >5 elements (‘long’), and plotted it against efficiency for ‘short’ strings (≤ 5) for each subject (shown in E, where different points refer to different participants). Fig.2 is based on panel D and Fig. 3 is based on panel E.

**Figure 2**
Replacement thresholds for the ‘word string’ (top row), Chinese ‘character string’ (middle row) and ‘letter string’ (bottom row) conditions. Panels A, C and E show data averaged across participants (corresponding data for individual participants S1–13 is shown in B, D and F). Thresholds are plotted on the y axis against number of string elements – 1 on the×axis (refer to Figure 1D). Solid symbols for recognition task, open symbols for repetition task. Grey lines show ideal predictions for recognition (thick) and repetition (thin), not separately visible in A and C because overlapping. Error bars show ±2 s.e.m. (smaller than symbol when not visible).

**Figure 3**
Efficiency for strings containing >5 (y axis) versus ≤ elements (x axis), where each element could be an English word (black), a Chinese character (light grey) or a Western letter (dark grey). Panel B magnifies a portion of A as indicated by the joining lines. Solid symbols for recognition task, open symbols for repetition task. Different data points refer to different participants (refer to Fig. 1E). Error bars show ±1 s.e.m.

See this image and copyright information in PMC

References

1. Barlow HB. The efficiency of detecting changes of density in random dot patterns. Vision Research. 1978;18:637–650. - PubMed
1. Barlow HB. The absolute efficiency of perceptual decisions. Phil. Trans. R. Soc. Lond. B. 1980;290:71–82. - PubMed
1. Barlow HB, Tripathy SP. Correspondence noise and signal pooling in the detection of coherent visual motion. J. Neurosci. 1997;17:7954–7966. - PMC - PubMed
1. Burgess AE, Colborne B. Visual signal detection. IV. Observer inconsistency. J. Opt. Soc. Am. A. 1988;5:617–627. - PubMed
1. Burgess AE, Wagner RF, Jennings RJ, Barlow HB. Efficiency of human visual signal discrimination. Science. 1981;214:93–94. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Human efficiency for classifying natural versus random text

Affiliation

Human efficiency for classifying natural versus random text

Authors

Affiliation

Abstract

Figures

Similar articles

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources