Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Oct;75(5):785-804.
doi: 10.1177/0013164414557639. Epub 2014 Nov 11.

Reducing Bias and Error in the Correlation Coefficient Due to Nonnormality

Affiliations

Reducing Bias and Error in the Correlation Coefficient Due to Nonnormality

Anthony J Bishara et al. Educ Psychol Meas. 2015 Oct.

Abstract

It is more common for educational and psychological data to be nonnormal than to be approximately normal. This tendency may lead to bias and error in point estimates of the Pearson correlation coefficient. In a series of Monte Carlo simulations, the Pearson correlation was examined under conditions of normal and nonnormal data, and it was compared with its major alternatives, including the Spearman rank-order correlation, the bootstrap estimate, the Box-Cox transformation family, and a general normalizing transformation (i.e., rankit), as well as to various bias adjustments. Nonnormality caused the correlation coefficient to be inflated by up to +.14, particularly when the nonnormality involved heavy-tailed distributions. Traditional bias adjustments worsened this problem, further inflating the estimate. The Spearman and rankit correlations eliminated this inflation and provided conservative estimates. Rankit also minimized random error for most sample sizes, except for the smallest samples (n = 10), where bootstrapping was more effective. Overall, results justify the use of carefully chosen alternatives to the Pearson correlation when normality is violated.

Keywords: Pearson; Spearman; correlation; nonnormal; normality; transformation.

PubMed Disclaimer

Conflict of interest statement

Declaration of Conflicting Interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Figure 1.
Figure 1.
Normal and nonnormal distribution shapes used in simulations.
Figure 2.
Figure 2.
Bias of the Pearson r as a function of sample size (n = 10-160), true population correlation (ρ = 0-.75), and distribution shapes of the X and Y variables. Note. The 95% confidence intervals of the mean for bias estimates were +/−.005 at most. HvyTail = Heavy-tailed; ExtrSkew = Extremely skewed; SltSkew = Slightly skewed.
Figure 3.
Figure 3.
The range of bias across scenarios illustrates the worst possible positive and negative biases of various statistical approaches. Note. Unadj. = unadjusted for bias; FAU = Fisher approximately unbiased adjustment; OP = Olkin and Pratt adjustment; RIN = rank-based inverse normal transformation.
Figure 4.
Figure 4.
Mean bias as a function of statistical approaches and distribution shapes. Note. HvyTail = Heavy-tailed; ExtrSkew = Extremely skewed; SltSkew = Slightly skewed; Unadj. = unadjusted for bias; FAU = Fisher approximately unbiased adjustment; OP = Olkin and Pratt adjustment; RIN = rank-based inverse normal transformation.
Figure 5.
Figure 5.
Mean RMSE as a function of bias adjustment and sample size. Note. The 95% confidence intervals of the mean for RMSE estimates were +/−.004 at most. RMSE = root mean squared error; Unadj. = unadjusted for bias; FAU = Fisher approximately unbiased adjustment; OP = Olkin and Pratt adjustment.
Figure 6.
Figure 6.
Mean RMSE as a function of statistical approach and sample size. Note. RMSE = root mean squared error; RIN = Rank-based inverse normal transformation.
Figure 7.
Figure 7.
Mean RMSE among major approaches as a function of sample size and distribution shape. Note. RMSE = root mean squared error; RIN = rank-based inverse normal transformation; HvyTail = Heavy-tailed; ExtrSkew = Extremely skewed; SltSkew = Slightly skewed.
Figure 8.
Figure 8.
Mean RMSE as a function of statistical approach and true population correlation (ρ). Note. RMSE = root mean squared error; RIN = rank-based inverse normal transformation.

References

    1. Arndt S., Turvey C., Andreasen N. C. (1999). Correlating and predicting psychiatric symptom ratings: Spearman’s r versus Kendall’s tau correlation. Journal of Psychiatric Research, 33, 97-104. - PubMed
    1. Beasley T., Erickson S., Allison D. (2009). Rank-based inverse normal transformations are increasingly used, but are they merited? Behavior Genetics, 39, 580-595. doi: 10.1007/s10519-009-9281-0 - DOI - PMC - PubMed
    1. Beasley W. H., DeShea L., Toothaker L. E., Mendoza J. L., Bard D. E., Rodgers J. (2007). Bootstrapping to test for nonzero population correlation coefficients using univariate sampling. Psychological Methods, 12, 414-433. doi: 10.1037/1082-989X.12.4.414 - DOI - PubMed
    1. Beasley W. H., Rodgers J. L. (2009). Resampling methods. In Millsap R. E., Maydeu-Olivares A. (Eds.), The SAGE handbook of quantitative methods in psychology (pp. 362-386). London, England: SAGE.
    1. Berry G. L. (1981). The Weibull distribution as a human performance descriptor. IEEE Transactions on Systems, Man, & Cybernetics, 11, 501-504. doi: 10.1109/TSMC.1981.4308727 - DOI

LinkOut - more resources