Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar 27;20(1):123-152.
doi: 10.1515/cllt-2022-0082. eCollection 2024 Feb.

Large-scale patterns of number use in spoken and written English

Affiliations

Large-scale patterns of number use in spoken and written English

Greg Woodin et al. Corpus Linguist Linguist Theory. .

Abstract

This paper describes patterns of number use in spoken and written English and the main factors that contribute to these patterns. We analysed more than 1.7 million occurrences of numbers between 0 and a billion in the British National Corpus, including conversational speech, presentational speech (e.g., lectures, interviews), imaginative writing (e.g., fiction), and informative writing (e.g., academic books). We find that four main factors affect number frequency: (1) Magnitude - smaller numbers are more frequent than larger numbers; (2) Roundness - round numbers are more frequent than unround numbers of a comparable magnitude, and some round numbers are more frequent than others; (3) Cultural salience - culturally salient numbers (e.g., recent years) are more frequent than non-salient numbers; and (4) Register - more informational texts contain more numbers (in writing), types of numbers, decimals, and larger numbers than less informational texts. In writing, we find that the numbers 1-9 are mostly represented by number words (e.g., 'three'), 10-999,999 are mostly represented by numerals (e.g., '14'), and 1 million-1 billion are mostly represented by a mix of numerals and number words (e.g., '8 million'). Altogether, this study builds a detailed profile of number use in spoken and written English.

Keywords: big data; number frequencies; numerical cognition; register studies; rounding.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Frequencies for all integers that appear in the British National Corpus in any representational format. Both axes are base-10 logarithmically scaled (log10). Non-multiples and multiples of 10, 50, 100, 500, 1,000, 10,000, and 100,000 are color coded on a categorical scale from dark blue to light turquoise (see legend). The number 0 is in red to highlight that we have manually coded its log10 value as –0.1 to visualize it on a log10 scale, as the logarithm of 0 is not defined. The numbers 0, 1, 2, 3, 100, 1,000, 1,000,000, and 1,000,000,000 are circled and labelled.
Figure 2:
Figure 2:
Proportions of number tokens in different log10 number ranges in the spoken and written subcorpora. Proportions are out of all number tokens in each subcorpus.
Figure 3:
Figure 3:
Proportions of number tokens in each log10 number range that were written in different representational formats. Proportions are out of all number tokens in each log10 number range.
Figure 4:
Figure 4:
Roundness as a radial category, where green indicates roundness and white indicates the absence of roundness. The most prototypical or ‘roundest’ numbers are in the center circle (i.e., numbers with all six roundness properties: 10-ness, 2-ness, 2.5-ness, 5-ness, being a multiple of ten, and being a multiple of five). Progressing outward through the inner rings are less prototypical or ‘less round’ numbers with five, four, three, two, or one roundness properties. The outer ring shows numbers that have no roundness properties.

References

    1. Ayonrinde Oyedeji A., Stefatos Anthi, Miller Shadé, Richer Amanda, Nadkarni Pallavi, She Jennifer, Alghofaily Ahmad, Mngoma Nomusa. The salience and symbolism of numbers across cultural beliefs and practice. International Review of Psychiatry . 2021;33(1–2):179–188. doi: 10.1080/09540261.2020.1769289. - DOI - PubMed
    1. Barchas-Lichtenstein Jena, Voiklis John, Attaway Bennett, Santhanam Laura, Parson Patti, Grace Thomas Uduak, Isaacs-Thomas Isabella, Ishwar Shivani, Fraser John. Number soup: Case studies of quantitatively dense news. Journalism Practice . 2022:1–28. doi: 10.1080/17512786.2022.2099954. - DOI
    1. Batorsky Ben, Ledvosky Alex, Yarkoni Tal, Groove Buttered. Word2Number. . 2021. [10 May 2021]. https://w2n.readthedocs.io/en/latest/ accessed.
    1. BBC Good Food Chilli con carne recipe. . 2022. [8 September 2022]. https://www.bbcgoodfood.com/recipes/chilli-con-carne-recipe BBC Good Food . accessed.
    1. Beltrama Andrea, Solt Stephanie, Burnett Heather. Context, precision, and social perception: A sociopragmatic study. Language in Society . 2022:1–31. doi: 10.1017/S0047404522000240. - DOI