Multicollinearity and misleading statistical results
- PMID: 31304696
- PMCID: PMC6900425
- DOI: 10.4097/kja.19087
Multicollinearity and misleading statistical results
Abstract
Multicollinearity represents a high degree of linear intercorrelation between explanatory variables in a multiple regression model and leads to incorrect results of regression analyses. Diagnostic tools of multicollinearity include the variance inflation factor (VIF), condition index and condition number, and variance decomposition proportion (VDP). The multicollinearity can be expressed by the coefficient of determination (Rh2) of a multiple regression model with one explanatory variable (Xh) as the model's response variable and the others (Xi [i ≠ h]) as its explanatory variables. The variance (σh2) of the regression coefficients constituting the final regression model are proportional to the VIF. Hence, an increase in Rh2 (strong multicollinearity) increases σh2. The larger σh2 produces unreliable probability values and confidence intervals of the regression coefficients. The square root of the ratio of the maximum eigenvalue to each eigenvalue from the correlation matrix of standardized explanatory variables is referred to as the condition index. The condition number is the maximum condition index. Multicollinearity is present when the VIF is higher than 5 to 10 or the condition indices are higher than 10 to 30. However, they cannot indicate multicollinear explanatory variables. VDPs obtained from the eigenvectors can identify the multicollinear variables by showing the extent of the inflation of σh2 according to each condition index. When two or more VDPs, which correspond to a common condition index higher than 10 to 30, are higher than 0.8 to 0.9, their associated explanatory variables are multicollinear. Excluding multicollinear explanatory variables leads to statistically stable multiple regression models.
Keywords: Biomedical research; Biostatistics; Multivariable analysis; Regression; Statistical bias; Statistical data analysis.
Conflict of interest statement
No potential conflict of interest relevant to this article was reported.
Figures

Similar articles
-
Problems of correlations between explanatory variables in multiple regression analyses in the dental literature.Br Dent J. 2005 Oct 8;199(7):457-61. doi: 10.1038/sj.bdj.4812743. Br Dent J. 2005. PMID: 16215581
-
Clarifying the role of mean centring in multicollinearity of interaction effects.Br J Math Stat Psychol. 2011 Nov;64(3):462-77. doi: 10.1111/j.2044-8317.2010.02002.x. Epub 2011 Jan 13. Br J Math Stat Psychol. 2011. PMID: 21973096
-
Estimation of genetic effects in the presence of multicollinearity in multibreed beef cattle evaluation.J Anim Sci. 2005 Aug;83(8):1788-800. doi: 10.2527/2005.8381788x. J Anim Sci. 2005. PMID: 16024697
-
Five myths about variable selection.Transpl Int. 2017 Jan;30(1):6-10. doi: 10.1111/tri.12895. Transpl Int. 2017. PMID: 27896874 Review.
-
Subgroup analyses in randomised controlled trials: quantifying the risks of false-positives and false-negatives.Health Technol Assess. 2001;5(33):1-56. doi: 10.3310/hta5330. Health Technol Assess. 2001. PMID: 11701102 Review.
Cited by
-
Utility of traditional and non-traditional lipid indicators in the diagnosis of nonalcoholic fatty liver disease in a Japanese population.Lipids Health Dis. 2022 Oct 7;21(1):95. doi: 10.1186/s12944-022-01712-z. Lipids Health Dis. 2022. PMID: 36207744 Free PMC article.
-
Exploring risk factors for cervical lymph node metastasis in papillary thyroid microcarcinoma: construction of a novel population-based predictive model.BMC Endocr Disord. 2022 Nov 4;22(1):269. doi: 10.1186/s12902-022-01186-1. BMC Endocr Disord. 2022. PMID: 36329470 Free PMC article.
-
Sympathetic predominance before tourniquet deflation is associated with a reduction in arterial blood pressure after tourniquet deflation during total knee arthroplasty.Physiol Res. 2021 Jul 12;70(3):401-412. doi: 10.33549/physiolres.934639. Epub 2021 May 12. Physiol Res. 2021. PMID: 33982581 Free PMC article.
-
Housing conditions, cooking fuels, and health-related quality of life among rural middle-aged and elderly in northwest China: A ten-year balanced panel study.Prev Med Rep. 2023 Dec 16;37:102563. doi: 10.1016/j.pmedr.2023.102563. eCollection 2024 Jan. Prev Med Rep. 2023. PMID: 38205167 Free PMC article.
-
The newly proposed Metabolic Score for Visceral Fat is a reliable tool for identifying non-alcoholic fatty liver disease, requiring attention to age-specific effects in both sexes.Front Endocrinol (Lausanne). 2023 Nov 27;14:1281524. doi: 10.3389/fendo.2023.1281524. eCollection 2023. Front Endocrinol (Lausanne). 2023. PMID: 38089634 Free PMC article.
References
-
- McDonald GC. Ridge regression. Wiley Interdiscip Rev Comput Stat. 2009;1:93–100.
-
- Sato Y, Tsukada K, Hatakeyama K. Role of shear stress and immune responses in liver regeneration after a partial hepatectomy. Surg Today. 1999;29:1–9. - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous