. 2015 Jun;75(3):365-388.

doi: 10.1177/0013164414548576. Epub 2014 Sep 15.

Descriptive Statistics for Modern Test Score Distributions: Skewness, Kurtosis, Discreteness, and Ceiling Effects

Andrew D Ho¹, Carol C Yu¹

Affiliations

PMID: 29795825
PMCID: PMC5965643
DOI: 10.1177/0013164414548576

Descriptive Statistics for Modern Test Score Distributions: Skewness, Kurtosis, Discreteness, and Ceiling Effects

Andrew D Ho et al. Educ Psychol Meas. 2015 Jun.

. 2015 Jun;75(3):365-388.

doi: 10.1177/0013164414548576. Epub 2014 Sep 15.

Authors

Andrew D Ho¹, Carol C Yu¹

Affiliation

¹ Harvard Graduate School of Education, Cambridge, MA, USA.

PMID: 29795825
PMCID: PMC5965643
DOI: 10.1177/0013164414548576

Abstract

Many statistical analyses benefit from the assumption that unconditional or conditional distributions are continuous and normal. More than 50 years ago in this journal, Lord and Cook chronicled departures from normality in educational tests, and Micerri similarly showed that the normality assumption is met rarely in educational and psychological practice. In this article, the authors extend these previous analyses to state-level educational test score distributions that are an increasingly common target of high-stakes analysis and interpretation. Among 504 scale-score and raw-score distributions from state testing programs from recent years, nonnormal distributions are common and are often associated with particular state programs. The authors explain how scaling procedures from item response theory lead to nonnormal distributions as well as unusual patterns of discreteness. The authors recommend that distributional descriptive statistics be calculated routinely to inform model selection for large-scale test score data, and they illustrate consequences of nonnormality using sensitivity studies that compare baseline results to those from normalized score scales.

Keywords: accountability; descriptive statistics; exploratory data analysis; high-stakes testing; psychometrics.

PubMed Disclaimer

Conflict of interest statement

Declaration of Conflicting Interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

**Figure 1.**
Skewness and kurtosis of raw score (gray, n = 174) and scale score (black, n = 330) distributions from 14 state testing programs, Grades 3 to 8, reading and mathematics, 2010 and 2011. *Note*. Distributions with kurtosis >5 are labeled with their state abbreviations: CO = Colorado; NY0 = New York, 2010; NY1 = New York, 2011; OK = Oklahoma; PA = Pennsylvania; WA = Washington. The theoretical lower bound of skewness and kurtosis is shown as a solid curve. The skewness and kurtosis of beta-binomial distributions are shown as a dashed line as a function of the average item proportion correct, µ_p, under the constraint that the test comprises 50 dichotomously scored items with item difficulties distributed as a beta distribution with parameters α and β that sum to 4.

**Figure 2.**
Skewness and kurtosis of scale score distributions from 14 state testing programs, Grades 3 to 8, reading and mathematics, 2010 and 2011, shown as boxplots by state abbreviations. *Note*. States are abbreviated. AK = Alaska; AZ = Arizona; CO = Colorado; ID = Idaho; NE = Nebraska; 4S = New England Common Assessment Program (Maine, New Hampshire, Rhode Island, Vermont); NJ = New Jersey; NY0 = New York, 2010; NY1 = New York, 2011; NC = North Carolina; OK = Oklahoma; PA = Pennsylvania; SD = South Dakota; TX = Texas; WA = Washington.

**Figure 3.**
Six discrete histograms of scale scores selected from a pool of 46 symmetric, mesokurtic distributions with skewness between ±0.1 and kurtosis between 2.75 and 3.75. Distributions chosen to illustrate characteristically stretched, high-density upper tails in spite of near-zero skewness. *Note.* All distributions are from 2011. s = skewness; k = kurtosis.

**Figure 4.**
Upper-tail features of scale score distributions from 14 state testing programs, Grades 3 to 8, reading and mathematics, 2010 and 2011. *Note.* Top tile: Count of discrete score points distinguishing among the top 10% of examinees. Middle tile: Percentage of total discrete score points distinguishing among the top 10% of examinees. Bottom tile: Distance from the second highest score point to the highest score point in standard deviation units. States are abbreviated. AK = Alaska; AZ = Arizona; CO = Colorado; ID = Idaho; NE = Nebraska; 4S = New England Common Assessment Program (Maine, New Hampshire, Rhode Island, Vermont); NJ = New Jersey; NY0 = New York, 2010; NY1 = New York, 2011; NC = North Carolina; OK = Oklahoma; PA = Pennsylvania; SD = South Dakota; TX = Texas; WA = Washington.

See this image and copyright information in PMC

Cited by

Overfactoring in rating scale data: A comparison between factor analysis and item response theory.
Revuelta J, Ximénez C, Minaya N. Revuelta J, et al. Front Psychol. 2022 Nov 30;13:982137. doi: 10.3389/fpsyg.2022.982137. eCollection 2022. Front Psychol. 2022. PMID: 36533017 Free PMC article.
Compas-Y: A mixed methods pilot evaluation of a mobile self-compassion training for people with newly diagnosed cancer.
Austin J, Schroevers MJ, Van Dijk J, Sanderman R, Børøsund E, Wymenga AMN, Bohlmeijer ET, Drossaert CHC. Austin J, et al. Digit Health. 2023 Oct 19;9:20552076231205272. doi: 10.1177/20552076231205272. eCollection 2023 Jan-Dec. Digit Health. 2023. PMID: 37868157 Free PMC article.
Examination of ChatGPT's Performance as a Data Analysis Tool.
Koçak D. Koçak D. Educ Psychol Meas. 2025 Jan 3:00131644241302721. doi: 10.1177/00131644241302721. Online ahead of print. Educ Psychol Meas. 2025. PMID: 39759537 Free PMC article.
The development and validation of the hospital organizational environment scale for medical staff in China.
Wang Y, Zhang J, Feng X, Liang Y, Guan Z, Meng K. Wang Y, et al. Front Public Health. 2023 Sep 21;11:1118337. doi: 10.3389/fpubh.2023.1118337. eCollection 2023. Front Public Health. 2023. PMID: 37809008 Free PMC article. Review.
Development and psychometric evaluation of the family intensive care units syndrome inventory.
Saeid Y, Ebadi A, Salaree MM, Moradian ST. Saeid Y, et al. Brain Behav. 2023 Jul;13(7):e3101. doi: 10.1002/brb3.3101. Epub 2023 Jun 6. Brain Behav. 2023. PMID: 37279159 Free PMC article. Review.

See all "Cited by" articles

References

1. Azzalini A., Capitanio A. (1999). Statistical applications of the multivariate skew normal distribution. Journal of the Royal Statistical Society Series B (Statistical Methodology), 61, 579-602.
1. Barnett V. (1975). Probability plotting methods and order statistics. Applied Statistics, 24, 95-108.
1. Betebenner D. W. (2009). Norm- and criterion-referenced student growth. Educational Measurement: Issues and Practice, 28(4), 42-51.
1. Bollen K. A. (1989). Structural equations with latent variables. New York, NY: Wiley.
1. Boneau C. A. (1960). The effects of violations of assumptions underlying the t test. Psychological Bulletin, 57, 49-64. - PubMed

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Descriptive Statistics for Modern Test Score Distributions: Skewness, Kurtosis, Discreteness, and Ceiling Effects

Affiliation

Descriptive Statistics for Modern Test Score Distributions: Skewness, Kurtosis, Discreteness, and Ceiling Effects

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous